CN112966027B

CN112966027B - Entity association mining method based on dynamic probe

Info

Publication number: CN112966027B
Application number: CN202110302533.5A
Authority: CN
Inventors: 陶冶; 郭帅童; 丁香乾; 侯瑞春; 李辉; 史操
Original assignee: Qingdao University of Science and Technology
Current assignee: Qingdao University of Science and Technology
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2022-10-21
Anticipated expiration: 2041-03-22
Also published as: CN112966027A

Abstract

The invention discloses an entity association mining method based on a dynamic probe, which comprises the steps of configuring interactive data of a probe monitoring application system to a database; processing the sensed data to form entity formatted data and storing the entity formatted data into a relational database; performing feature fusion on the entities and existing entities in a relational database, and respectively calculating the similarity of attribute information, the similarity of attribute values and the log similarity of two compared entities; and then, obtaining the similarity of the two comparison entities by using a fuzzy logic reasoning method according to the attribute information similarity, the attribute value similarity and the log similarity obtained by calculation, so as to realize the association matching between the entities. The invention adopts an entity association mining method based on a dynamic probe to obtain related data stored by enterprise services and a database, measures the similarity between different entities through multi-dimensional characteristics, and adopts a fuzzy logic reasoning method to give the best matching result between the entities, thereby saving manual matching time.

Description

Entity association mining method based on dynamic probe

Technical Field

The invention belongs to the technical field of data management, and particularly relates to a mining method for realizing relevance between entities in a heterogeneous information system.

Background

With the development of enterprise business development and informatization construction, massive data is accumulated in databases of various business systems, and the data generally has the characteristics of multiple sources, isomerism, autonomy and the like. The appropriate data fusion technology is adopted to integrate fragmented data dispersed in a plurality of systems into a comprehensive and accurate enterprise data space, which is beneficial to breaking information islands among the systems and provides effective support for deeply mining data association relations, constructing knowledge maps and realizing comprehensive and efficient data sharing.

Entity association across systems is an important ring in the data integration process. Generally, in a traditional data warehouse construction, multiple processes are needed to realize cross-system entity association matching, database management personnel acquire corresponding data according to requirements set by business personnel, and professional personnel with relevant backgrounds are needed to assist in data analysis processing, so that association between entities is confirmed and matched in a manual mode. For example, the price field in a table in the system a and the unit _ price field in a table in the system B actually describe the price of a certain product and the price of a certain component subordinate to the product, and the price of the component directly affects the price fluctuation of the product, so there is a close relationship between the data information of the two fields. However, with current technology, this correlation finding and matching is typically done in a manual mode. In a business system with a certain scale, entity attributes are often thousands, and it is a very time-consuming task to completely rely on manual discovery of the correlation between data.

In addition, with the continuous development of enterprise business, the entity association matching result has hysteresis and needs to be continuously adjusted according to specific conditions. If the discovery and the matching of the relevance between the entities can be automatically realized, the manual matching time of related personnel can be saved.

Disclosure of Invention

The invention aims to provide an entity association mining method based on a dynamic probe, which can automatically discover and match the association of each attribute between entities.

In order to solve the technical problems, the invention adopts the following technical scheme to realize:

an entity association mining method based on a dynamic probe comprises the following processes:

configuring a probe, and intercepting request information of an application system to a database and corresponding response data;

processing the sensed data to form entity formatted data and storing the entity formatted data into a relational database;

and performing feature fusion on the entity and the existing entity in the relational database, wherein the process comprises the following steps:

calculating the similarity of the attribute information of the two comparison entities;

calculating the similarity of the attribute values of the two compared entities;

calculating the log similarity of two comparison entities;

and obtaining the similarity of the two comparison entities by using a fuzzy logic reasoning method according to the attribute information similarity, the attribute value similarity and the log similarity obtained by calculation.

Compared with the prior art, the invention has the advantages and positive effects that: the invention adopts an entity association mining method based on a dynamic probe to obtain related data stored by enterprise business and a database, measures the similarity between different entities through multi-dimensional characteristics, gives the best matching result between the entities by adopting a fuzzy logic reasoning method, and provides the best matching result for related personnel as reference, thereby saving the manual matching time of the related personnel and improving the working efficiency.

Other features and advantages of the present invention will become more apparent from the detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings.

Drawings

FIG. 1 is a general architecture diagram of an embodiment of a dynamic probe-based entity association mining method proposed by the present invention;

FIG. 2 is a data processing flow diagram of one embodiment of a dynamic probe;

FIG. 3 is a flow diagram of one embodiment of an attribute information analysis process;

FIG. 4 is a diagram of one embodiment of a tree semantic hierarchy;

FIG. 5 is a flow diagram of one embodiment of an attribute value analysis process;

FIG. 6 is a flow diagram of one embodiment of a log analysis process;

FIG. 7 is a flow diagram of one embodiment of a comparison entity similarity determination process;

FIG. 8 is a tree diagram illustration of a specific example.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings.

As shown in fig. 1, the entity association mining method of this embodiment mainly includes processing procedures such as data interception, attribute information analysis, attribute value analysis, log analysis, and similarity determination based on fuzzy logic. The method comprises the steps that request information and corresponding response data of an application system to a database are monitored at any time by configuring a dynamic pointer, and further formatted data of an entity are formed and stored in a relational database; and then, carrying out feature fusion on the entity data newly stored in the relational database and the existing entity data in the relational database, analyzing the similarity of the two comparison entities in three dimensions of attribute information, attribute values and log access, and further giving the similarity between the two comparison entities by adopting a fuzzy logic reasoning method to realize automatic association matching between the two entities.

The data interception and feature fusion processes of the dynamic probe are described in detail below.

The mature service architecture generally adopts a data read-write separation mode to perform service development. The enterprise software architecture is generally divided into a business logic layer, a database access layer and a database storage layer. By means of the database access layer, in the process of requesting to access the database, the business personnel reads the database by using the API interface provided by the database management software without concerning the operation details of the bottom database.

As shown in fig. 2, the embodiment first loads a front probe on the middleware for intercepting interaction data between the application system and the database. Specifically, the bidirectional dynamic probe can be configured to listen to request information and corresponding response data of the service logic layer to the database. And then, cleaning and sorting the data obtained by the bidirectional dynamic probe, solving the problems of data noise, format confusion and the like, and uploading the data serving as the formatted data of an entity to a relational database for feature fusion processing.

Feature fusion refers to associative matching between entities. And storing the attribute information in the formatted data acquired by the dynamic probe into a database, storing the attribute values according to the organization format of the source database, and storing the log information in a file form.

First, entities may be pre-classified according to the data types listed in Table 1, with the possibility that there will be similarities between entities belonging to the same data type, and entities of different data types will generally not be similar.

Type of data	Member
		Shaping machine	SMALLINT、MEDIUMINT、INT、BIGINT
Floating point type	FLOAT、DOUBLE、DECIMAL
		Date	YEAR、DATE、TIME、DATETIME
Character(s)	char、VARchar、BLOB、TEXT

TABLE 1

And then, analyzing the data uploaded by the dynamic probe formatting from three dimensions of 'attribute information, attribute values and logs' to obtain the similarity degree between the entities.

(one) analysis for attribute information

The attribute information may be divided into an attribute name and an attribute constraint. Attribute constraints typically include data type, whether it can be null, whether it is a primary key, whether it is a foreign key, comments, and the like. Based on this information, the data can be classified and matched, and the matching process preferably includes the following aspects, as shown in fig. 3:

1. calculating naive text similarity

In this embodiment, the edit distance is preferably used to measure the degree of similarity between the two sequences. The edit distance refers to the minimum number of single character edit operations required to convert one word to another between two words. Calculating naive text similarity S according to editing times ₁ The formula is as follows:

wherein, w ₁ 、w ₂ The attribute names of the two compared entities are respectively; l. the ₁ 、l ₂ Are respectively attribute names w ₁ And w ₂ The character length of (2); d is an attribute name w ₁ And w ₂ The edit distance of (d); max is a function of taking the maximum value.

2. Calculating text semantic similarity

Because the application scenario, naming specification, etc. are different, the description of the same entity may have different expressions, for example, information of some upstream company is recorded in an enterprise database, and the attribute name of the upstream company may be named as company id and SupplierID due to the difference of the scenario. For this case, it is difficult to find the similarity relationship between the attribute names by only a naive text analysis. Therefore, it is preferable to further discriminate the association between the attribute names of the two entities by using an analysis method based on semantic similarity.

Specifically, a syntax dictionary provided by wordnet may be used to establish a tree-like semantic hierarchy relationship for the attribute names, such as the tree diagram shown in fig. 4, and the similarity between the attribute names is calculated through the corresponding positions of the attribute names in the tree diagram.

The specific calculation formula is as follows:

wherein, N ₁ And N ₂ Respectively represent attribute names w ₁ 、w ₂ Shortest path between the shortest path and the attribute name w of the nearest public father node; h denotes the shortest path from w to the root node.

3. Calculating attribute name similarity

Then calculate the plain text similarity S ₁ Semantic similarity to text S ₂ Then, S may be ₁ And S ₂ Similarity S with maximum value of (1) as attribute name ₃ Namely:

S ₃ ＝Max(S ₁ ,S ₂ ) ③

where Max is a function taking the maximum value.

4. Calculating attribute constraint similarity

When the attribute information is built, the constraint thereof usually follows a certain design principle, for example: data type, whether it is a primary key, whether it is a foreign key, whether it is empty, etc., as shown in table 2.

Candidate constraints	i＝1	i＝2	i＝3	i＝4	i＝5
						Definition of	Data type	Whether or not it is empty	Whether it is a main key	Whether it is an external key	Note

TABLE 2

Respectively defining attribute constraint vectors of two comparison entities as A and B; wherein A is _i And B _i Respectively representing the values of the ith candidate constraint corresponding to the vector A and the vector B, and enabling:

wherein n is the number of candidate constraints in vector A and vector B, and otherwise represents dividing A _i ＝B _i Other than the case. For example, A ₁ And B ₁ Respectively representing the data types of two attribute constraints if A ₁ And B ₁ Are all integer, then A ₁ ＝B ₁ ，v ₁ =1; if A ₁ For shaping, B ₁ In floating point type, then A ₁ ≠B ₁ ，v ₁ And =0. And so on.

Calculating attribute constraint similarity S ₄ The formula is as follows:

5. calculating similarity of attribute information

Because the attribute information comprises the attribute name and the attribute constraint, a weighting algorithm can be adopted to calculate the similarity S of the attribute information of two compared entities ₅ The formula is as follows:

S ₅ ＝α·S ₃ +β·S ₄ ⑥

wherein, alpha and beta are weights, different weights can be distributed according to different conditions, and alpha is more than or equal to 0 and less than or equal to 1, beta is more than or equal to 0 and less than or equal to 1, and alpha + beta =1.

(II) analysis for attribute value

According to different data types, attribute values can be divided into four types: numeric values, characters, enumerations, text. The calculation methods of the similarity of the attribute values of different data types are different, and are described below with reference to fig. 5.

1. Numerical attribute value

For the case where the attribute values of two contrasting entities are both numerical values, the similarity between the contrasting entities may be considered from the point of view of the numerical distribution. In this embodiment, the mean, the median of the calculations, the mode, the standard deviation of the samples, the maximum, and the minimum may be selected as the feature vector elements. Of course, several of them may be selected, or other statistical methods may be selected as the feature vector elements, and the embodiment is not limited to the above examples.

The feature vectors of the attribute values of two comparison entities are represented by u and v, respectively, and the definition of the feature vector elements is shown in table 3:

mean value of	Median of arithmetic	Mode number	Standard deviation of sample	Maximum value	Minimum value
						u ₁	u ₂	u ₃	u ₄	u ₅	u ₆
v ₁	v ₂	v ₃	v ₄	v ₅	v ₆

TABLE 3

Substituting the statistic corresponding to each feature vector element into formula (7) to calculate the similarity S of the attribute values of two comparison entities with the attribute value as the numerical value ₆ ：

Where m is the number of eigenvector elements.

2. Character type attribute value

Referring to short text content, the word frequency-inverse document frequency can be used as a similarity judgment basis.

In particular toIn other words, first, the attribute values of two comparison entities are merged to form a corpus; then, respectively calculating the word frequency-inverse document frequency corresponding to the attribute value of each entity by adopting a word frequency-inverse document frequency algorithm to correspondingly form vectors U and V; finally, substituting the vectors U and V into a formula (8) to calculate the similarity S of the attribute values of the two comparison entities ₇ ：

3. Enumerated attribute values

The attribute value includes at least two data. The attribute values of two comparison entities can be converted into two sets A and B, and the ratio of intersection to union in the sets is calculated by using a formula (9) and is used as the similarity S of enumerated attribute values ₈ ：

Wherein, n is an intersection symbol; u is a union symbol; and | is an absolute value sign.

4. Text type attribute value

For the attribute values of the long text content, a mathematical model can be established by adopting a self-coding algorithm in deep learning, the mathematical model is trained by utilizing partial data in the attribute values, and the similarity of the attribute values of two comparison entities is calculated by utilizing the trained mathematical model.

Specifically, the following steps may be included:

(a) Randomly selecting k data from p data of the attribute value of one entity to form a training set, forming a test set by using the remaining p-k data, and then training a mathematical model established by using a self-coding algorithm in deep learning by using the training set to form the trained mathematical model;

(b) According to a predefined threshold value omega, if the similarity result obtained by calculating the trained mathematical model of the data in the test set is greater than omega, the data is considered to be similar to the training set; calculating the proportion of the number of data judged to be similar to the training set after the trained mathematical model in the test set is calculated in the training set (namely the proportion of the number of data judged to be similar to the training set in the test set to k) and recording as lambda;

(c) Forming a test set by using all data in the attribute value of another entity, repeating the step (b), and marking the obtained proportion as theta;

(d) Calculating similarity S of attribute values of two compared entities by using formula R ₉ ：

Wherein Min is a function of minimum value.

(III) analysis against logs

During each interaction between the application system and the database, a log file is generated. After the log file is stored in a relational database and the log is formatted, the SQL command of the log file contains the equivalence relation between the entities and can be used as an analysis basis for measuring the similarity of the entities. The similarity between entities can be obtained by counting the number of equivalent relationships in the log file, as shown in fig. 6.

Specifically, assuming that a and b are two comparison entities, the log similarity calculation formula of the two comparison entities is as follows:

wherein N is _a 、N _b The number of times that the attribute name and/or attribute value of the entity a and the entity b appear in the log file can be specifically counted by adopting the number of times that the SQL command containing the attribute name and/or attribute value of the entity a appears in the log _a The number of times of occurrence of SQL commands containing the attribute name and/or attribute value of the entity b in the log is adopted to count N _b ；

N _ab In log file for attribute name and/or attribute value of entity a, bThe number of times of simultaneous occurrence in the pieces may specifically be counted by the number of times of occurrence of SQL commands that simultaneously include the attribute names and/or attribute values of the entities a and b in the log.

(IV) discriminating the similarity of two compared entities

Similarity S of the calculated attribute information ₅ Similarity of attribute values S ₆ /S ₇ /S ₈ /S ₉ Log similarity S ₁₀ And judging by using a fuzzy logic reasoning method to obtain the similarity of the two comparison entities.

Referring to fig. 7, the following process is specifically included:

firstly, fuzzification processing is carried out on the attribute information similarity, the attribute value similarity and the log similarity of two comparison entities respectively by adopting a membership function, and the membership is calculated. The membership function preferably adopts a triangular membership function, and the independent variable value range [0,1] and the dependent variable value range [0,1]; degree of membership { dissimilar, general, similar }.

Secondly, the fuzzification rule is formulated as follows:

if the If attribute information and the attribute value similar or attribute information and the log similar or attribute value and the log similar or attribute information, the attribute value and the log are similar, the two comparison entities are similar;

if the attribute information is similar, the attribute value is similar to the general log or attribute value of the log, the attribute information is similar to the general log or log of the log, and the attribute information is similar to the attribute value of the general log or attribute value of the log, the two comparison entities are similar;

if the attribute information, the attribute value and the log are general, the two comparison entities are similar;

the two comparative entities of Else are dissimilar;

wherein If, or and Else are respectively logic conditions: if, or otherwise.

And finally, performing defuzzification processing. That is, if it is determined that the two comparison entities are not similar according to the fuzzification rule, the result is 0; if the results are similar, the result is the average value of the specific gravity maximum values in the three membership degree vectors.

The result is the degree of similarity of the two compared entities. In the relational database, every time a new entity is added, the entity can be automatically associated and matched with other existing entities in the relational database, so that a matching result between the entities is formed and provided for relevant personnel as reference, manual matching time is saved, and working efficiency is improved.

The following describes the entity similarity degree calculation method according to the present embodiment by using a specific example.

The feasibility of the solution of the present embodiment was verified by analyzing the data set of the product, and the data information of the two entities is shown in tables 4 and 5.

TABLE 4 product Table

TABLE 5 company Table

Taking the data in the first row in tables 4 and 5 as an example, the specific implementation process of the association method between the entities is demonstrated:

step 1: calculating similarity of attribute information

Step 1-1: calculating naive text similarity of attribute names of two entities

Both the product ID and the componyID have a character length of 9, i.e./l ₁ ＝9、l ₂ =9; the edit distance D =7 between the product id and the componyid is substituted into the formula (1), and the naive text similarity S can be calculated ₁ ＝0.22。

Step 1-2: calculating text semantic similarity of attribute names of two entities

In combination with the actual situation, a grammar dictionary provided by the wordnet is used for establishing a tree diagram related to the grammar dictionary, as shown in fig. 8. H =7 and N are obtained according to the corresponding positions of the attribute names productID and companyID in the tree diagram ₁ ＝40、N ₂ =46, the product id can be calculated by substituting equation (2)And semantic similarity of text between the ananyID and the compcanyID

Step 1-3: generating attribute name similarity of two entities

Will S ₁ And S ₂ Substituting the formula (3) to obtain: s ₃ ＝Max(S ₁ ,S ₂ ) Max (0.22, 0.14) =0.22; the similarity of the attribute names productID and componyid is 0.22.

Step 1-4: calculating similarity of attribute constraints

The attribute constraint information from tables 4 and 5 may be derived: v = [1,1,1,1,1 =]. Calculating the attribute constraint similarity value S according to the formula (5) ₄ =1; according to the formula (6), the weights are assigned to α =0.5 and β =0.5, and the similarity S is constrained by the attribute ₅ ＝α·S ₃ +β·S ₄ ＝0.5×0.22+0.5×1＝0.61。

And 2, step: calculating attribute value similarity

Since the attribute values of the two entities are integer data, the similarity of the attribute values of the two entities in this example should be calculated by using a numerical attribute value similarity calculation method.

That is, first, the values of the respective elements are calculated in accordance with the feature vector elements defined in table 3, and feature vectors u and v are formed. Assuming that u = [50.5, 1,29.01,100,1], v = [28.5, 1,16.31,56,1], according to equation (7), the similarity of the attribute values of the two entities can be calculated:

and 3, step 3: calculating log similarity

By counting the relevant log information, it is found that the times of occurrence of the productID and the componyID in the log are 447 and 389 respectively, and the times of co-occurrence are 328. According to the formula

Log similarity can be obtained

And 4, step 4: fuzzy logic similarity discrimination

Three dimensional vectors for measuring the similarity of the product ID and the companyID are [0.61,0.99 and 0.39] according to the steps; the membership degrees are [0,0.56,0.44], [0,1], [0.44,0.56,0] in this order by fuzzifying the triangle membership functions.

Wherein [0,0.56,0.44] indicates that the probability that the attribute information of the two entities is dissimilar is 0, the probability that the similarity of the attribute information is general is 0.56, and the probability that the attribute information is similar is 0.44;

[0,1] indicates that the probability that the attribute values of two entities are dissimilar is 0, the probability that the similarity of the attribute values is general is 0, and the probability that the attribute values are similar is 1;

[0.44,0.56,0] indicates that the log of two entities is not similar with a probability of 0.44, the log is similar with a probability of 0.56, and the log is similar with a probability of 0.

Therefore, according to the fuzzy rule, two entities can be judged to be similar.

The similarity of productID and companyID obtained by de-blurring is (0.56 +1+ 0.56)/3 =0.707.

The similarity degree calculation can be performed for the remaining rows of data in tables 4 and 5 according to the above steps, as shown in fig. 6.

TABLE 6

As can be seen from table 6, there is a large correlation between the air conditioner price and the compressor price.

Of course, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An entity association mining method based on a dynamic probe is characterized by comprising the following steps:

processing the sensed data to form formatted data of an entity and storing the formatted data into a relational database;

calculating the similarity of the attribute information of the two compared entities;

calculating the log similarity of two comparison entities;

obtaining the similarity degree of the two comparison entities by using a fuzzy logic reasoning method according to the attribute information similarity degree, the attribute value similarity degree and the log similarity degree which are obtained by calculation;

wherein, the process of calculating the similarity of the attribute information of the two comparison entities comprises the following steps:

the attribute information comprises an attribute name and an attribute constraint;

the calculation process of the attribute name similarity comprises the following steps:

calculating the similarity S of plain text ₁ ；

Calculating text semantic similarity S ₂ ；

Selection of S ₁ 、S ₂ The maximum value of (1) as the attribute name similarity S ₃ ；

The calculation process of the attribute constraint similarity comprises the following steps:

respectively defining attribute constraint vectors of two compared entities as A and B; wherein A is _i And B _i Respectively representing the values of the ith candidate constraint corresponding to the vector A and the vector B;

calculating out

Wherein n is a vectorThe number of candidate constraints in A and vector B, otherwise denotes dividing A _i ＝B _i Other situations than;

calculating attribute constraint similarity

Calculating the similarity S of the attribute information of two compared entities by adopting a weighting algorithm ₅ ＝α·S ₃ +β·S ₄ (ii) a Wherein, alpha and beta are weights, and alpha is epsilon [0,1]，β∈[0,1]And α + β =1.

2. The dynamic probe-based entity association mining method of claim 1, wherein the naive text similarity S ₁ The formula is adopted to calculate and obtain:

wherein w ₁ And w ₂ The attribute names of the two compared entities are respectively; l ₁ ,l ₂ For the attribute name w ₁ And w ₂ D is the attribute name w ₁ And w ₂ Max is a function of taking the maximum value.

3. The dynamic probe-based entity association mining method according to claim 1 or 2, wherein the text semantic similarity S ₂ The calculation process of (2) is as follows:

establishing a tree semantic hierarchy relation to form a tree graph;

calculating the attribute names w of two compared entities according to the corresponding positions of the attribute names in the dendrogram ₁ And w ₂ Similarity between them

Wherein N is ₁ And N ₂ Respectively represent attribute names w ₁ 、w ₂ Shortest path to the nearest public parent node attribute name w; h denotes the shortest path from w to the root node.

4. The dynamic probe-based entity association mining method according to claim 1, wherein the process of calculating the similarity of the attribute values of two compared entities is:

according to different data types, attribute values are divided into four types, namely: numeric type, character type, enumeration type, text type;

aiming at the numerical attribute value, selecting a plurality of or all of the average value, the median of the calculated number, the mode, the sample standard deviation, the maximum value and the minimum value as characteristic vector elements to form characteristic vectors u and v corresponding to two comparison entities, and calculating the similarity of the attribute values of the two comparison entities

Aiming at the character type attribute values, firstly, combining the attribute values of two comparison entities to form a corpus; then, respectively calculating the word frequency-inverse document frequency corresponding to the attribute value of each entity by adopting a word frequency-inverse document frequency algorithm to correspondingly form vectors U and V; calculating similarity of attribute values of two compared entities

Aiming at the enumerated attribute value, the attribute value of each entity at least comprises two data, the attribute values of two comparison entities are converted into two sets A and B, and the similarity of the attribute values of the two comparison entities is calculated

Wherein, n is an intersection symbol; u is a union symbol;

aiming at the text type attribute value, a mathematical model is established by adopting a self-coding algorithm in deep learning, the model is trained by utilizing data in the attribute value, and the similarity of the attribute values of two comparison entities is calculated by utilizing the trained model.

5. The dynamic probe-based entity association mining method of claim 4, wherein the process of calculating the similarity of the attribute values of two compared entities for the text-type attribute values is:

randomly selecting k data from the attribute value of one entity to form a training set, forming a test set by using the remaining data, and training the established mathematical model by using the training set;

predefining a threshold value omega, if the similarity result obtained by calculating the trained mathematical model of the data in the test set is greater than omega, determining that the data is similar to the training set; calculating the proportion of the number of data judged to be similar after the trained mathematical model in the test set to k, and recording as lambda;

forming a test set by using all data in the attribute value of the other entity, repeating the previous step, and recording the obtained proportion as theta;

calculating similarity of attribute values of two contrasting entities

Where Min is a function of minimum.

6. The entity association mining method based on dynamic probe as claimed in claim 1, wherein the process of calculating the log similarity of two compared entities is:

in the system operation process, the fusion feature space stores a log file;

recording the two comparison entities as a and b, the log similarity of the two comparison entities is:

wherein, N _a 、N _b Attribute names and/or of entities a, b, respectivelyThe number of times the attribute value appears in the log file; n is a radical of hydrogen _ab Is the number of times the attribute names and/or attribute values of entities a, b appear in the log file at the same time.

7. The entity association mining method based on dynamic probe as claimed in claim 6, wherein in the process of counting the times of occurrence and simultaneous occurrence of attribute names and/or attribute values of entities a, b in the log file, the number of occurrences of SQL command is used for counting.

8. The dynamic probe-based entity association mining method according to claim 1, wherein the process of using fuzzy logic reasoning method to derive the similarity degree of two comparison entities is:

fuzzification processing is respectively carried out on attribute information similarity, attribute value similarity and log similarity of two comparison entities by adopting a triangular membership function, and dissimilar, common and similar membership vectors are correspondingly generated;

judging whether the two comparison entities are similar or not according to a specified fuzzification rule;

if the judgment result is not similar, the result is 0;

if the results are similar, the result is the average value of the specific gravity maximum values in the three membership degree vectors.

9. The dynamic probe-based entity association mining method of claim 8, wherein the fuzzification rule is:

the two compared entities of Else are dissimilar;

wherein If, or and Else are respectively logic conditions: if, or otherwise.