CN115374223A

CN115374223A - Intelligent blood relationship identification recommendation method and system based on rules and machine learning

Info

Publication number: CN115374223A
Application number: CN202210766523.1A
Authority: CN
Inventors: 金震; 张京日; 穆宇浩; 詹焕哲
Original assignee: Beijing SunwayWorld Science and Technology Co Ltd
Current assignee: Beijing SunwayWorld Science and Technology Co Ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-11-22
Anticipated expiration: 2042-06-30
Also published as: CN115374223B

Abstract

The invention discloses an intelligent blood relationship identification recommendation method and system based on rules and machine learning, wherein the method comprises the following steps: constructing a machine learning model, and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on a data pattern comparison rule, and determining an intersection covering relation based on the unique values; sequencing the intersection coverage relation; and performing sorting filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering. Based on the data pattern comparison rule and in combination with machine learning ability, the blood relationship identification and discovery of data are realized, and enterprises are helped to construct a data network. The cost of enterprise data governance has greatly been reduced, the efficiency of data governance is effectively improved.

Description

Intelligent blood relationship identification recommendation method and system based on rules and machine learning

Technical Field

The invention relates to the technical field of data management, in particular to an intelligent blood relationship identification recommendation method and system based on rules and machine learning.

Background

The data blood margin is taken as the key point in the actual data management process, the phenomena of treating and developing two skins can be effectively solved, and the problems of various traceability analyses, influence judgment and the like in the data management and development processes can be effectively supported and analyzed. However, currently, due to various data development tools, for example, a mode for identifying data context through a mode such as SQL parsing, SQL is a Structured Query Language (Structured Query Language), which is a special purpose programming Language, and is a database Query and programming Language for accessing data and querying, updating, and managing a relational database system.

The prior art has the following defects: data dispersion, data bloodiness reason can not be discerned and managed by effectual, are discerned by the manual work under a lot of circumstances, cause huge cost to waste, simultaneously, also very big reduction the intelligent process that data were administered.

Disclosure of Invention

The invention provides an intelligent blood margin identification recommendation method and system based on rules and machine learning, and aims to solve the problems in the prior art.

The invention provides an intelligent blood relationship identification recommendation method based on rules and machine learning, which comprises the following steps:

s100, constructing a machine learning model, and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;

s200, clustering data fields based on a machine learning model to obtain a plurality of clusters;

s300, comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining the intersection covering relation based on the unique values;

s400, sequencing the intersection covering relation;

s500, sorting and filtering are carried out based on the sorting, and a blood relationship list among the physical tables is formed after filtering.

Preferably, after step S500, the method further includes:

s600, recommending the contents ranked at the top in the list of the relationship of the blood relationship to a user for the user to select, selecting by the user according to the recommended physical tables at the upper and lower streams, and adding the selected table as a new feature into the calculation of the ranking of the interaction coverage relationship.

Preferably, the S200 includes:

s201, performing text semantic extraction on the content of the data field based on a machine learning model to obtain the semantics of the data field;

s202, clustering the data fields according to contents, types, semantics and labels to form a plurality of clusters containing different characteristics.

Preferably, the method for calculating the cluster includes:

forming the data field into view data;

extracting a feature matrix of data from the views, and learning similar graphs of all the views by adopting a dynamic neighbor graph construction method; calculating a transition probability matrix corresponding to each view; taking the transition probability matrix as the input of a Markov chain spectrum clustering algorithm to obtain a clustering result;

specifically, the transition probability matrix is calculated as follows: stacking the transition probability matrix of each view, constructing a target tensor, rotating the tensor, dividing the rotated tensor into a clean tensor and an error tensor, constraining the clean tensor based on a tensor nuclear norm where t-is v to obtain a low-rank clean tensor, and summing all side slices of the low-rank clean tensor to obtain the transition probability matrix;

the construction premise of the target tensor is that a target function is constructed, and the target tensor is determined based on the target function.

The optimization of the objective function comprises the optimization of a tensor A constructed by a matrix with low rank and an error tensor B constructed by a noise matrix decomposed by each view;

the optimization formula for tensor A is as follows:

wherein A is ^t+1 The iterative optimization value of the t +1 th time of the expression tensor A, wherein A represents the low rank tensor mu ^t Represents a penalty parameter, μ, at the t-th iteration ^t > 0,t denotes the number of iterations, y ^t Representing the Lagrange multiplier of the T-th iteration of the tensor A, T representing the rotation tensor of the target tensor, wherein the T tensor comprises the tensor A and the tensor B, F represents the norm, and B ^t The t-th iteration value representing tensor B;

the optimization formula for tensor B is as follows:

wherein, B ₍₃₎ The representation tensor B is matrixed along modulo-3; b is an error tensor, and gamma represents a non-negative balance parameter;

represents the optimized value, mu, after matrixing along modulo-3 in the process of the t +1 th iteration ^t Represents a penalty parameter, μ, at the t-th iteration ^t > 0, t denotes the number of iterations,

representing the Lagrangian multiplier, T, at the T-th iteration after matrixing tensor B along modulo-3 ₍₃₎ Representing the rotation tensor of the object tensor matrixed along modulo-3, F representing the norm,

the optimized values matrixed along modulo-3 during the t +1 iteration of the representation tensor A.

And calculating an optimization result for determining the objective function based on the optimization of the tensor A and the tensor B.

The calculation formula has better convergence and reduces the calculation complexity.

Preferably, the S400 includes:

and sequencing the intersection coverage relation by adopting a PageRank sequencing method.

Preferably, the S500 includes:

s501, setting a sorting threshold value to form a blood relationship between physical tables;

s502, filtering is carried out based on the sorting and the sorting threshold value, and a bloody border relation list between the physical table and the physical table is formed.

The invention provides an intelligent blood relationship identification recommendation system based on rules and machine learning, which comprises:

the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;

the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;

the intersection covering relation determining unit is used for comparing the unique values of the data fields in each cluster based on the data pattern comparison rule and determining the intersection covering relation based on the unique values;

the sequencing unit is used for sequencing the intersection covering relation;

and the blood relationship list forming unit is used for carrying out sorting and filtering based on the sorting and forming the blood relationship list among the physical tables after filtering.

Preferably, the method further comprises the following steps:

and the recommending unit is used for recommending the contents ranked at the top in the blood relationship list to a user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected table is used as a new feature to be added into the calculation of the cross-over relationship ranking.

Preferably, the clustering unit includes:

the semantic extraction subunit is used for performing text semantic extraction on the content of the data field based on the machine learning model to obtain the semantics of the data field;

and the characteristic clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.

Preferably, the sorting unit includes:

and the PageRank sorting subunit is used for sorting the intersection coverage relation by adopting a PageRank sorting method.

Preferably, the list of kindred relationships forming unit includes:

a sorting threshold setting subunit, configured to set a sorting threshold to form a blood relationship between the physical tables;

and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical table and the physical table.

Compared with the prior art, the invention has the following advantages:

the invention provides an intelligent blood relationship identification recommendation method and system based on rules and machine learning, which comprises the following steps: constructing a machine learning model, and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on a data pattern comparison rule, and determining an intersection covering relation based on the unique values; sequencing the intersection covering relation; and performing sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.

The scheme adopted by the invention is based on the data pattern comparison rule and combined with the machine learning ability, realizes the blood relationship identification and discovery of the data, and helps enterprises to construct a data network. The cost of enterprise data governance has greatly been reduced, the efficiency of data governance is effectively improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart of an intelligent blood-level identification recommendation method based on rules and machine learning according to an embodiment of the present invention;

FIG. 2 is a display diagram of an identification recommendation interface of the intelligent blood vessel identification recommendation method based on rules and machine learning according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an intelligent blood margin identification recommendation system based on rules and machine learning in an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

The embodiment of the invention provides an intelligent blood vessel identification recommendation method based on rules and machine learning, and please refer to fig. 1, the method comprises the following steps:

s400, sequencing the intersection covering relation;

The working principle of the technical scheme is as follows: the method adopts the scheme that a machine learning model is constructed, and a plurality of pieces of characteristic information of all data fields are identified based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on a data pattern comparison rule, and determining an intersection covering relation based on the unique values; sequencing the intersection covering relation; and performing sorting filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.

The beneficial effects of the above technical scheme are: a machine learning model is constructed by adopting the scheme provided by the embodiment, and a plurality of pieces of characteristic information of all data fields are identified based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on a data mode comparison rule, and determining an intersection covering relation based on the unique values; sequencing the intersection coverage relation; and performing sorting filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.

The scheme adopted by the embodiment is based on the data pattern comparison rule and combined with the machine learning capacity, so that the blood relationship identification and discovery of the data are realized, and the enterprise is helped to construct a data network. The cost of enterprise data governance has greatly been reduced, the efficiency of data governance is effectively improved.

In another embodiment, after step S500, the method further includes:

The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the contents ranked in the top in the list of the relationship of the blood relationship are recommended to the user for the user to select, the user selects according to the recommended physical tables at the upper and lower streams, and the selected table is used as a new feature to be added into the calculation of the ranking of the aggregation coverage relationship.

Referring to fig. 2, by generating a corresponding list of relationship between blood vessels, the data relationship system may provide and recommend the upstream and downstream physical tables (automatic classification result) to the user, and the user may select the corresponding physical table according to the classification result and participate in the subsequent calculation as a new feature.

The beneficial effects of the above technical scheme are: by adopting the scheme provided by the embodiment, the contents ranked at the top in the list of the blood relationship are recommended to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected table is used as a new feature to be added into the calculation of the rank of the aggregation coverage relationship.

In another embodiment, the S200 includes:

The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that text semantics extraction is carried out on the content of the data field based on a machine learning model to obtain the semantics of the data field; and clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.

The clustering method comprises the following steps: a k-means clustering algorithm, a hierarchical clustering algorithm and a spectral clustering algorithm.

In addition, text semantics can be extracted by adopting a semantics extraction model, the semantics extraction model converts an input text into a word vector form for input, a one-dimensional convolution structure for canceling a pooling layer is used for obtaining a word vector, double-granularity features are obtained, and a dropout layer is used for preventing overfitting; and a global attention mechanism is adopted, weight vectors of all parts are obtained by utilizing the context information and the hidden unit information, weight distribution is carried out, and text classification is obtained based on an activation function and a full connection layer so as to realize text semantic extraction.

The beneficial effects of the above technical scheme are: performing text semantic extraction on the content of the data field based on a machine learning model by adopting the scheme provided by the embodiment to obtain the semantics of the data field; and clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.

In another embodiment, the S400 includes:

The working principle of the technical scheme is as follows: the present embodiment adopts a scheme that the S400 includes:

PageRank is a method for identifying the rank or importance of web pages by calculating the rank of the web pages based on their mutual link relationship. The PageRank algorithm calculates the PageRank value for each web page and then ranks the importance of the web page according to the size of the value.

The beneficial effects of the above technical scheme are: the step S400 of adopting the scheme provided by this embodiment includes:

In another embodiment, the S500 includes:

The working principle of the technical scheme is as follows: the present embodiment adopts a scheme that the S500 includes:

s502, filtering is carried out based on the sorting and the sorting threshold value, and a blood relationship list between the physical table and the physical table is formed.

The beneficial effects of the above technical scheme are: the step S500 of adopting the solution provided by this embodiment includes:

In another embodiment, the present embodiment further provides an intelligent blood margin identification recommendation system based on rules and machine learning, referring to fig. 3, the system includes:

the sequencing unit is used for sequencing the intersection covering relation;

The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the system comprises: the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters; the intersection covering relation determining unit is used for comparing the unique values of the data fields in each cluster based on the data mode comparison rule and determining the intersection covering relation based on the unique values; the sequencing unit is used for sequencing the intersection coverage relation; and the blood relationship list forming unit is used for carrying out sorting and filtering based on the sorting and forming the blood relationship list among the physical tables after filtering.

The beneficial effects of the above technical scheme are: adopt the scheme this system that this embodiment provided to include: the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters; the intersection covering relation determining unit is used for comparing the unique values of the data fields in each cluster based on the data mode comparison rule and determining the intersection covering relation based on the unique values; the sequencing unit is used for sequencing the intersection coverage relation; and the blood relationship list forming unit is used for carrying out sorting and filtering based on the sorting and forming the blood relationship list among the physical tables after filtering.

In another embodiment, further comprising:

and the recommending unit is used for recommending the contents ranked at the top in the list of the relationship of the blood relationship to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected table is used as a new feature to be added into the calculation of the ranking of the aggregation coverage relationship.

The working principle of the technical scheme is as follows: the scheme adopted by the embodiment further comprises the following steps:

The beneficial effects of the above technical scheme are: the scheme provided by the embodiment further comprises the following steps:

In another embodiment, the clustering unit includes:

The working principle of the technical scheme is as follows: the scheme adopted by this embodiment is that the clustering unit includes:

The beneficial effects of the above technical scheme are: the clustering unit adopting the scheme provided by the embodiment comprises:

In another embodiment, the sorting unit includes:

The working principle of the technical scheme is as follows: the scheme adopted by this embodiment is that the sorting unit includes:

The beneficial effects of the above technical scheme are: the sorting unit adopting the scheme provided by the embodiment comprises:

In another embodiment, the kindred relationship list forming unit includes:

The working principle of the technical scheme is as follows: the embodiment adopts a scheme that the blood relationship list forming unit comprises:

The beneficial effects of the above technical scheme are: the blood relationship list forming unit adopting the scheme provided by the embodiment comprises:

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An intelligent blood margin identification recommendation method based on rules and machine learning is characterized by comprising the following steps:

s200, clustering the data fields based on a machine learning model to obtain a plurality of clusters;

s400, sequencing the intersection coverage relation;

2. The intelligent blood margin identification recommendation method based on rule and machine learning of claim 1, further comprising, after step S500:

3. The intelligent blood margin identification recommendation method based on rule and machine learning of claim 1, wherein the S200 comprises:

s201, extracting text semantics of the content of the data field based on a machine learning model to obtain the semantics of the data field;

4. The intelligent blood margin identification recommendation method based on rule and machine learning of claim 1, wherein the S400 comprises:

5. The intelligent blood margin identification recommendation method based on rule and machine learning of claim 1, wherein the S500 comprises:

6. An intelligent blood vessel identification recommendation system based on rules and machine learning, comprising:

the sequencing unit is used for sequencing the intersection covering relation;

and the blood relationship list forming unit is used for carrying out sorting and filtering based on the sorting and forming a blood relationship list among the physical tables after filtering.

7. The intelligent blood margin identification recommendation system based on rule and machine learning of claim 6, further comprising:

8. The intelligent blood relationship identification recommendation system based on rule and machine learning according to claim 6, characterized in that the clustering unit comprises:

9. The intelligent blood margin identification recommendation system based on rule and machine learning according to claim 6, wherein the sorting unit comprises:

10. The intelligent blood relationship identification recommendation system based on rule and machine learning according to claim 6, wherein said blood relationship list forming unit comprises: