CN115374223A - Intelligent blood relationship identification recommendation method and system based on rules and machine learning - Google Patents

Intelligent blood relationship identification recommendation method and system based on rules and machine learning Download PDF

Info

Publication number
CN115374223A
CN115374223A CN202210766523.1A CN202210766523A CN115374223A CN 115374223 A CN115374223 A CN 115374223A CN 202210766523 A CN202210766523 A CN 202210766523A CN 115374223 A CN115374223 A CN 115374223A
Authority
CN
China
Prior art keywords
machine learning
sorting
data
learning model
blood relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210766523.1A
Other languages
Chinese (zh)
Other versions
CN115374223B (en
Inventor
金震
张京日
穆宇浩
詹焕哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SunwayWorld Science and Technology Co Ltd
Original Assignee
Beijing SunwayWorld Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SunwayWorld Science and Technology Co Ltd filed Critical Beijing SunwayWorld Science and Technology Co Ltd
Priority to CN202210766523.1A priority Critical patent/CN115374223B/en
Publication of CN115374223A publication Critical patent/CN115374223A/en
Application granted granted Critical
Publication of CN115374223B publication Critical patent/CN115374223B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an intelligent blood relationship identification recommendation method and system based on rules and machine learning, wherein the method comprises the following steps: constructing a machine learning model, and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on a data pattern comparison rule, and determining an intersection covering relation based on the unique values; sequencing the intersection coverage relation; and performing sorting filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering. Based on the data pattern comparison rule and in combination with machine learning ability, the blood relationship identification and discovery of data are realized, and enterprises are helped to construct a data network. The cost of enterprise data governance has greatly been reduced, the efficiency of data governance is effectively improved.

Description

Intelligent blood relationship identification recommendation method and system based on rules and machine learning
Technical Field
The invention relates to the technical field of data management, in particular to an intelligent blood relationship identification recommendation method and system based on rules and machine learning.
Background
The data blood margin is taken as the key point in the actual data management process, the phenomena of treating and developing two skins can be effectively solved, and the problems of various traceability analyses, influence judgment and the like in the data management and development processes can be effectively supported and analyzed. However, currently, due to various data development tools, for example, a mode for identifying data context through a mode such as SQL parsing, SQL is a Structured Query Language (Structured Query Language), which is a special purpose programming Language, and is a database Query and programming Language for accessing data and querying, updating, and managing a relational database system.
The prior art has the following defects: data dispersion, data bloodiness reason can not be discerned and managed by effectual, are discerned by the manual work under a lot of circumstances, cause huge cost to waste, simultaneously, also very big reduction the intelligent process that data were administered.
Disclosure of Invention
The invention provides an intelligent blood margin identification recommendation method and system based on rules and machine learning, and aims to solve the problems in the prior art.
The invention provides an intelligent blood relationship identification recommendation method based on rules and machine learning, which comprises the following steps:
s100, constructing a machine learning model, and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
s200, clustering data fields based on a machine learning model to obtain a plurality of clusters;
s300, comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining the intersection covering relation based on the unique values;
s400, sequencing the intersection covering relation;
s500, sorting and filtering are carried out based on the sorting, and a blood relationship list among the physical tables is formed after filtering.
Preferably, after step S500, the method further includes:
s600, recommending the contents ranked at the top in the list of the relationship of the blood relationship to a user for the user to select, selecting by the user according to the recommended physical tables at the upper and lower streams, and adding the selected table as a new feature into the calculation of the ranking of the interaction coverage relationship.
Preferably, the S200 includes:
s201, performing text semantic extraction on the content of the data field based on a machine learning model to obtain the semantics of the data field;
s202, clustering the data fields according to contents, types, semantics and labels to form a plurality of clusters containing different characteristics.
Preferably, the method for calculating the cluster includes:
forming the data field into view data;
extracting a feature matrix of data from the views, and learning similar graphs of all the views by adopting a dynamic neighbor graph construction method; calculating a transition probability matrix corresponding to each view; taking the transition probability matrix as the input of a Markov chain spectrum clustering algorithm to obtain a clustering result;
specifically, the transition probability matrix is calculated as follows: stacking the transition probability matrix of each view, constructing a target tensor, rotating the tensor, dividing the rotated tensor into a clean tensor and an error tensor, constraining the clean tensor based on a tensor nuclear norm where t-is v to obtain a low-rank clean tensor, and summing all side slices of the low-rank clean tensor to obtain the transition probability matrix;
the construction premise of the target tensor is that a target function is constructed, and the target tensor is determined based on the target function.
The optimization of the objective function comprises the optimization of a tensor A constructed by a matrix with low rank and an error tensor B constructed by a noise matrix decomposed by each view;
the optimization formula for tensor A is as follows:
Figure BDA0003722379940000021
wherein A is t+1 The iterative optimization value of the t +1 th time of the expression tensor A, wherein A represents the low rank tensor mu t Represents a penalty parameter, μ, at the t-th iteration t > 0,t denotes the number of iterations, y t Representing the Lagrange multiplier of the T-th iteration of the tensor A, T representing the rotation tensor of the target tensor, wherein the T tensor comprises the tensor A and the tensor B, F represents the norm, and B t The t-th iteration value representing tensor B;
the optimization formula for tensor B is as follows:
Figure BDA0003722379940000031
wherein, B (3) The representation tensor B is matrixed along modulo-3; b is an error tensor, and gamma represents a non-negative balance parameter;
Figure BDA0003722379940000032
represents the optimized value, mu, after matrixing along modulo-3 in the process of the t +1 th iteration t Represents a penalty parameter, μ, at the t-th iteration t > 0, t denotes the number of iterations,
Figure BDA0003722379940000033
representing the Lagrangian multiplier, T, at the T-th iteration after matrixing tensor B along modulo-3 (3) Representing the rotation tensor of the object tensor matrixed along modulo-3, F representing the norm,
Figure BDA0003722379940000034
the optimized values matrixed along modulo-3 during the t +1 iteration of the representation tensor A.
And calculating an optimization result for determining the objective function based on the optimization of the tensor A and the tensor B.
The calculation formula has better convergence and reduces the calculation complexity.
Preferably, the S400 includes:
and sequencing the intersection coverage relation by adopting a PageRank sequencing method.
Preferably, the S500 includes:
s501, setting a sorting threshold value to form a blood relationship between physical tables;
s502, filtering is carried out based on the sorting and the sorting threshold value, and a bloody border relation list between the physical table and the physical table is formed.
The invention provides an intelligent blood relationship identification recommendation system based on rules and machine learning, which comprises:
the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;
the intersection covering relation determining unit is used for comparing the unique values of the data fields in each cluster based on the data pattern comparison rule and determining the intersection covering relation based on the unique values;
the sequencing unit is used for sequencing the intersection covering relation;
and the blood relationship list forming unit is used for carrying out sorting and filtering based on the sorting and forming the blood relationship list among the physical tables after filtering.
Preferably, the method further comprises the following steps:
and the recommending unit is used for recommending the contents ranked at the top in the blood relationship list to a user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected table is used as a new feature to be added into the calculation of the cross-over relationship ranking.
Preferably, the clustering unit includes:
the semantic extraction subunit is used for performing text semantic extraction on the content of the data field based on the machine learning model to obtain the semantics of the data field;
and the characteristic clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.
Preferably, the sorting unit includes:
and the PageRank sorting subunit is used for sorting the intersection coverage relation by adopting a PageRank sorting method.
Preferably, the list of kindred relationships forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical table and the physical table.
Compared with the prior art, the invention has the following advantages:
the invention provides an intelligent blood relationship identification recommendation method and system based on rules and machine learning, which comprises the following steps: constructing a machine learning model, and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on a data pattern comparison rule, and determining an intersection covering relation based on the unique values; sequencing the intersection covering relation; and performing sorting and filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The scheme adopted by the invention is based on the data pattern comparison rule and combined with the machine learning ability, realizes the blood relationship identification and discovery of the data, and helps enterprises to construct a data network. The cost of enterprise data governance has greatly been reduced, the efficiency of data governance is effectively improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart of an intelligent blood-level identification recommendation method based on rules and machine learning according to an embodiment of the present invention;
FIG. 2 is a display diagram of an identification recommendation interface of the intelligent blood vessel identification recommendation method based on rules and machine learning according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an intelligent blood margin identification recommendation system based on rules and machine learning in an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The embodiment of the invention provides an intelligent blood vessel identification recommendation method based on rules and machine learning, and please refer to fig. 1, the method comprises the following steps:
s100, constructing a machine learning model, and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
s200, clustering data fields based on a machine learning model to obtain a plurality of clusters;
s300, comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining the intersection covering relation based on the unique values;
s400, sequencing the intersection covering relation;
s500, sorting and filtering are carried out based on the sorting, and a blood relationship list among the physical tables is formed after filtering.
The working principle of the technical scheme is as follows: the method adopts the scheme that a machine learning model is constructed, and a plurality of pieces of characteristic information of all data fields are identified based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on a data pattern comparison rule, and determining an intersection covering relation based on the unique values; sequencing the intersection covering relation; and performing sorting filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The beneficial effects of the above technical scheme are: a machine learning model is constructed by adopting the scheme provided by the embodiment, and a plurality of pieces of characteristic information of all data fields are identified based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; clustering the data fields based on a machine learning model to obtain a plurality of clusters; comparing the unique values of the data fields in each cluster based on a data mode comparison rule, and determining an intersection covering relation based on the unique values; sequencing the intersection coverage relation; and performing sorting filtering based on the sorting, and forming a blood relationship list among the physical tables after filtering.
The scheme adopted by the embodiment is based on the data pattern comparison rule and combined with the machine learning capacity, so that the blood relationship identification and discovery of the data are realized, and the enterprise is helped to construct a data network. The cost of enterprise data governance has greatly been reduced, the efficiency of data governance is effectively improved.
In another embodiment, after step S500, the method further includes:
s600, recommending the contents ranked at the top in the list of the relationship of the blood relationship to a user for the user to select, selecting by the user according to the recommended physical tables at the upper and lower streams, and adding the selected table as a new feature into the calculation of the ranking of the interaction coverage relationship.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the contents ranked in the top in the list of the relationship of the blood relationship are recommended to the user for the user to select, the user selects according to the recommended physical tables at the upper and lower streams, and the selected table is used as a new feature to be added into the calculation of the ranking of the aggregation coverage relationship.
Referring to fig. 2, by generating a corresponding list of relationship between blood vessels, the data relationship system may provide and recommend the upstream and downstream physical tables (automatic classification result) to the user, and the user may select the corresponding physical table according to the classification result and participate in the subsequent calculation as a new feature.
The beneficial effects of the above technical scheme are: by adopting the scheme provided by the embodiment, the contents ranked at the top in the list of the blood relationship are recommended to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected table is used as a new feature to be added into the calculation of the rank of the aggregation coverage relationship.
In another embodiment, the S200 includes:
s201, performing text semantic extraction on the content of the data field based on a machine learning model to obtain the semantics of the data field;
s202, clustering the data fields according to contents, types, semantics and labels to form a plurality of clusters containing different characteristics.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that text semantics extraction is carried out on the content of the data field based on a machine learning model to obtain the semantics of the data field; and clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.
The clustering method comprises the following steps: a k-means clustering algorithm, a hierarchical clustering algorithm and a spectral clustering algorithm.
In addition, text semantics can be extracted by adopting a semantics extraction model, the semantics extraction model converts an input text into a word vector form for input, a one-dimensional convolution structure for canceling a pooling layer is used for obtaining a word vector, double-granularity features are obtained, and a dropout layer is used for preventing overfitting; and a global attention mechanism is adopted, weight vectors of all parts are obtained by utilizing the context information and the hidden unit information, weight distribution is carried out, and text classification is obtained based on an activation function and a full connection layer so as to realize text semantic extraction.
The beneficial effects of the above technical scheme are: performing text semantic extraction on the content of the data field based on a machine learning model by adopting the scheme provided by the embodiment to obtain the semantics of the data field; and clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.
In another embodiment, the S400 includes:
and sequencing the intersection coverage relation by adopting a PageRank sequencing method.
The working principle of the technical scheme is as follows: the present embodiment adopts a scheme that the S400 includes:
and sequencing the intersection coverage relation by adopting a PageRank sequencing method.
PageRank is a method for identifying the rank or importance of web pages by calculating the rank of the web pages based on their mutual link relationship. The PageRank algorithm calculates the PageRank value for each web page and then ranks the importance of the web page according to the size of the value.
The beneficial effects of the above technical scheme are: the step S400 of adopting the scheme provided by this embodiment includes:
and sequencing the intersection coverage relation by adopting a PageRank sequencing method.
In another embodiment, the S500 includes:
s501, setting a sorting threshold value to form a blood relationship between physical tables;
s502, filtering is carried out based on the sorting and the sorting threshold value, and a bloody border relation list between the physical table and the physical table is formed.
The working principle of the technical scheme is as follows: the present embodiment adopts a scheme that the S500 includes:
s501, setting a sorting threshold value to form a blood relationship between physical tables;
s502, filtering is carried out based on the sorting and the sorting threshold value, and a blood relationship list between the physical table and the physical table is formed.
The beneficial effects of the above technical scheme are: the step S500 of adopting the solution provided by this embodiment includes:
s501, setting a sorting threshold value to form a blood relationship between physical tables;
s502, filtering is carried out based on the sorting and the sorting threshold value, and a blood relationship list between the physical table and the physical table is formed.
In another embodiment, the present embodiment further provides an intelligent blood margin identification recommendation system based on rules and machine learning, referring to fig. 3, the system includes:
the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;
the intersection covering relation determining unit is used for comparing the unique values of the data fields in each cluster based on the data pattern comparison rule and determining the intersection covering relation based on the unique values;
the sequencing unit is used for sequencing the intersection covering relation;
and the blood relationship list forming unit is used for carrying out sorting and filtering based on the sorting and forming the blood relationship list among the physical tables after filtering.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment is that the system comprises: the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters; the intersection covering relation determining unit is used for comparing the unique values of the data fields in each cluster based on the data mode comparison rule and determining the intersection covering relation based on the unique values; the sequencing unit is used for sequencing the intersection coverage relation; and the blood relationship list forming unit is used for carrying out sorting and filtering based on the sorting and forming the blood relationship list among the physical tables after filtering.
The beneficial effects of the above technical scheme are: adopt the scheme this system that this embodiment provided to include: the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field; the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters; the intersection covering relation determining unit is used for comparing the unique values of the data fields in each cluster based on the data mode comparison rule and determining the intersection covering relation based on the unique values; the sequencing unit is used for sequencing the intersection coverage relation; and the blood relationship list forming unit is used for carrying out sorting and filtering based on the sorting and forming the blood relationship list among the physical tables after filtering.
The scheme adopted by the embodiment is based on the data pattern comparison rule and combined with the machine learning capacity, so that the blood relationship identification and discovery of the data are realized, and the enterprise is helped to construct a data network. The cost of enterprise data governance has greatly been reduced, the efficiency of data governance is effectively improved.
In another embodiment, further comprising:
and the recommending unit is used for recommending the contents ranked at the top in the list of the relationship of the blood relationship to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected table is used as a new feature to be added into the calculation of the ranking of the aggregation coverage relationship.
The working principle of the technical scheme is as follows: the scheme adopted by the embodiment further comprises the following steps:
and the recommending unit is used for recommending the contents ranked at the top in the list of the relationship of the blood relationship to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected table is used as a new feature to be added into the calculation of the ranking of the aggregation coverage relationship.
The beneficial effects of the above technical scheme are: the scheme provided by the embodiment further comprises the following steps:
and the recommending unit is used for recommending the contents ranked at the top in the list of the relationship of the blood relationship to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected table is used as a new feature to be added into the calculation of the ranking of the aggregation coverage relationship.
In another embodiment, the clustering unit includes:
the semantic extraction subunit is used for performing text semantic extraction on the content of the data field based on the machine learning model to obtain the semantics of the data field;
and the characteristic clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.
The working principle of the technical scheme is as follows: the scheme adopted by this embodiment is that the clustering unit includes:
the semantic extraction subunit is used for performing text semantic extraction on the content of the data field based on the machine learning model to obtain the semantics of the data field;
and the characteristic clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.
The beneficial effects of the above technical scheme are: the clustering unit adopting the scheme provided by the embodiment comprises:
the semantic extraction subunit is used for performing text semantic extraction on the content of the data field based on the machine learning model to obtain the semantics of the data field;
and the characteristic clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.
In another embodiment, the sorting unit includes:
and the PageRank sorting subunit is used for sorting the intersection coverage relation by adopting a PageRank sorting method.
The working principle of the technical scheme is as follows: the scheme adopted by this embodiment is that the sorting unit includes:
and the PageRank sorting subunit is used for sorting the intersection coverage relation by adopting a PageRank sorting method.
The beneficial effects of the above technical scheme are: the sorting unit adopting the scheme provided by the embodiment comprises:
and the PageRank sorting subunit is used for sorting the intersection coverage relation by adopting a PageRank sorting method.
In another embodiment, the kindred relationship list forming unit includes:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical table and the physical table.
The working principle of the technical scheme is as follows: the embodiment adopts a scheme that the blood relationship list forming unit comprises:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical table and the physical table.
The beneficial effects of the above technical scheme are: the blood relationship list forming unit adopting the scheme provided by the embodiment comprises:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical table and the physical table.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. An intelligent blood margin identification recommendation method based on rules and machine learning is characterized by comprising the following steps:
s100, constructing a machine learning model, and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
s200, clustering the data fields based on a machine learning model to obtain a plurality of clusters;
s300, comparing the unique values of the data fields in each cluster based on the data pattern comparison rule, and determining the intersection covering relation based on the unique values;
s400, sequencing the intersection coverage relation;
s500, sorting and filtering are carried out based on the sorting, and a blood relationship list among the physical tables is formed after filtering.
2. The intelligent blood margin identification recommendation method based on rule and machine learning of claim 1, further comprising, after step S500:
s600, recommending the contents ranked at the top in the list of the relationship of the blood relationship to a user for the user to select, selecting by the user according to the recommended physical tables at the upper and lower streams, and adding the selected table as a new feature into the calculation of the ranking of the interaction coverage relationship.
3. The intelligent blood margin identification recommendation method based on rule and machine learning of claim 1, wherein the S200 comprises:
s201, extracting text semantics of the content of the data field based on a machine learning model to obtain the semantics of the data field;
s202, clustering the data fields according to contents, types, semantics and labels to form a plurality of clusters containing different characteristics.
4. The intelligent blood margin identification recommendation method based on rule and machine learning of claim 1, wherein the S400 comprises:
and sequencing the intersection coverage relation by adopting a PageRank sequencing method.
5. The intelligent blood margin identification recommendation method based on rule and machine learning of claim 1, wherein the S500 comprises:
s501, setting a sorting threshold value to form a blood relationship between physical tables;
s502, filtering is carried out based on the sorting and the sorting threshold value, and a bloody border relation list between the physical table and the physical table is formed.
6. An intelligent blood vessel identification recommendation system based on rules and machine learning, comprising:
the characteristic information identification unit is used for constructing a machine learning model and identifying a plurality of pieces of characteristic information of all data fields based on the machine learning model; the characteristic information comprises a unique value, a maximum value and a minimum value of a field;
the clustering unit is used for clustering the data fields based on the machine learning model to obtain a plurality of clusters;
the intersection covering relation determining unit is used for comparing the unique values of the data fields in each cluster based on the data pattern comparison rule and determining the intersection covering relation based on the unique values;
the sequencing unit is used for sequencing the intersection covering relation;
and the blood relationship list forming unit is used for carrying out sorting and filtering based on the sorting and forming a blood relationship list among the physical tables after filtering.
7. The intelligent blood margin identification recommendation system based on rule and machine learning of claim 6, further comprising:
and the recommending unit is used for recommending the contents ranked at the top in the list of the relationship of the blood relationship to the user for the user to select, the user selects according to the recommended upstream and downstream physical tables, and the selected table is used as a new feature to be added into the calculation of the ranking of the aggregation coverage relationship.
8. The intelligent blood relationship identification recommendation system based on rule and machine learning according to claim 6, characterized in that the clustering unit comprises:
the semantic extraction subunit is used for performing text semantic extraction on the content of the data field based on the machine learning model to obtain the semantics of the data field;
and the characteristic clustering subunit is used for clustering the data fields according to the content, the type, the semantics and the labels to form a plurality of clusters containing different characteristics.
9. The intelligent blood margin identification recommendation system based on rule and machine learning according to claim 6, wherein the sorting unit comprises:
and the PageRank sorting subunit is used for sorting the intersection coverage relation by adopting a PageRank sorting method.
10. The intelligent blood relationship identification recommendation system based on rule and machine learning according to claim 6, wherein said blood relationship list forming unit comprises:
a sorting threshold setting subunit, configured to set a sorting threshold to form a blood relationship between the physical tables;
and the filtering subunit is used for filtering based on the sorting and the sorting threshold value to form a blood relationship list between the physical table and the physical table.
CN202210766523.1A 2022-06-30 2022-06-30 Intelligent blood margin identification recommendation method and system based on rules and machine learning Active CN115374223B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210766523.1A CN115374223B (en) 2022-06-30 2022-06-30 Intelligent blood margin identification recommendation method and system based on rules and machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210766523.1A CN115374223B (en) 2022-06-30 2022-06-30 Intelligent blood margin identification recommendation method and system based on rules and machine learning

Publications (2)

Publication Number Publication Date
CN115374223A true CN115374223A (en) 2022-11-22
CN115374223B CN115374223B (en) 2023-06-13

Family

ID=84061200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210766523.1A Active CN115374223B (en) 2022-06-30 2022-06-30 Intelligent blood margin identification recommendation method and system based on rules and machine learning

Country Status (1)

Country Link
CN (1) CN115374223B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180039890A1 (en) * 2016-08-03 2018-02-08 Electronics And Telecommunications Research Institute Adaptive knowledge base construction method and system
CN110083639A (en) * 2019-04-25 2019-08-02 中电科嘉兴新型智慧城市科技发展有限公司 A kind of method and device that the data blood relationship based on clustering is intelligently traced to the source
CN113469280A (en) * 2021-07-22 2021-10-01 烽火通信科技股份有限公司 Data blood margin discovery method, system and device based on graph neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180039890A1 (en) * 2016-08-03 2018-02-08 Electronics And Telecommunications Research Institute Adaptive knowledge base construction method and system
CN110083639A (en) * 2019-04-25 2019-08-02 中电科嘉兴新型智慧城市科技发展有限公司 A kind of method and device that the data blood relationship based on clustering is intelligently traced to the source
CN113469280A (en) * 2021-07-22 2021-10-01 烽火通信科技股份有限公司 Data blood margin discovery method, system and device based on graph neural network

Also Published As

Publication number Publication date
CN115374223B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
Yu et al. Typesql: Knowledge-based type-aware neural text-to-sql generation
Freitas A genetic programming framework for two data mining tasks: classification and generalized rule induction
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
AU2020380139B2 (en) Data preparation using semantic roles
CN105279264B (en) A kind of semantic relevancy computational methods of document
CN108733766A (en) A kind of data query method, apparatus and readable medium
CN110059181A (en) Short text stamp methods, system, device towards extensive classification system
CA2510761A1 (en) Automated taxonomy generation
US20080215313A1 (en) Speech and Textual Analysis Device and Corresponding Method
Atramentov et al. A multi-relational decision tree learning algorithm–implementation and experiments
CN108154198A (en) Knowledge base entity normalizing method, system, terminal and computer readable storage medium
CN106372087A (en) Information retrieval-oriented information map generation method and dynamic updating method
CN103425740A (en) IOT (Internet Of Things) faced material information retrieval method based on semantic clustering
CN115526246A (en) Self-supervision molecular classification method based on deep learning model
Bonfitto et al. Semi-automatic column type inference for CSV table understanding
Ye et al. Multi-level rough set reduction for decision rule mining
Scherger et al. A systematic overview of the prediction of business failure
Rajman et al. From text to knowledge: Document processing and visualization: A text mining approach
CN115794798B (en) Market supervision informatization standard management and dynamic maintenance system and method
CN115374223A (en) Intelligent blood relationship identification recommendation method and system based on rules and machine learning
Chai et al. A novel association rules method based on genetic algorithm and fuzzy set strategy for web mining.
CN114969087A (en) NL2SQL method and device based on multi-view feature decoupling
Chen String metrics and word similarity applied to information retrieval
CN113673889A (en) Intelligent data asset identification method
Wu et al. Beyond greedy search: pruned exhaustive search for diversified result ranking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant