CN110781309A

CN110781309A - Entity parallel relation similarity calculation method based on pattern matching

Info

Publication number: CN110781309A
Application number: CN201910583113.1A
Authority: CN
Inventors: 刘家祥
Original assignee: Central Mdt Infotech Ltd Of United States Of Xiamen
Current assignee: Central Mdt Infotech Ltd Of United States Of Xiamen
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2020-02-11

Abstract

A method for calculating similarity of entity parallel relation based on pattern matching comprises the following steps: constructing a knowledge graph A and a database B; inputting initial data C with successfully matched mode; inputting the initial data C into a knowledge graph A, clustering the obtained entity fragment information group D, and establishing a first common word network F according to the obtained first clustered entity information E; searching the corpus comprising the entity fragment information group D in a database, and inputting the obtained corpus group G into a knowledge graph A to obtain entity information H contained in the corpus group G; clustering the obtained entity information H, and establishing a second common-word network J corresponding to the corpus group G according to the obtained second clustered entity information I; and calculating the similarity between the first co-word network F and the second co-word network J. The invention can shorten the retrieval time of the user for acquiring the information required by the user from the database, thereby improving the working efficiency.

Description

Entity parallel relation similarity calculation method based on pattern matching

Technical Field

The invention relates to the technical field of data processing, in particular to a method for calculating similarity of entity parallel relation based on pattern matching.

Background

The calculation of entity similarity has many applications, and a typical application scenario of the similarity model is to find other entities similar to a certain entity. With the development of information network technology, information on a network grows exponentially, and when information of related subjects needs to be counted, as data information counted on the network cannot be estimated, a lot of human resources are inevitably wasted only by depending on human management, and a lot of time is consumed to obtain the needed related information, so that deviation often occurs; therefore, the application provides an entity parallel relation similarity calculation method based on pattern matching.

Disclosure of Invention

Objects of the invention

In order to solve the technical problems in the background art, the invention provides a method for calculating the similarity of entity parallel relations based on pattern matching.

(II) technical scheme

In order to solve the problems, the invention provides an entity parallel relation similarity calculation method based on pattern matching, which comprises the following specific steps:

s1, constructing a knowledge graph A and a database B;

s2, inputting initial data C with successfully matched mode;

s3, inputting the initial data C into the knowledge graph A to obtain an entity fragment information group D;

s4, clustering the obtained entity fragment information group D to obtain first clustering entity information E;

s5, establishing a first common word network F corresponding to the initial data A according to the first clustering entity information E;

s6, searching the corpus comprising the entity fragment information group D in the database to obtain a corpus group G;

s7, inputting the corpus group G into the knowledge graph A to obtain entity information H included in the corpus group G;

s8, clustering the obtained entity information H to obtain second clustering entity information I;

s9, establishing a second common-word network J corresponding to the corpus G according to the second clustering entity information I;

s10, calculating the similarity between the first co-word network F and the second co-word network J.

Preferably, the initial data C includes structured data, unstructured data and semi-structured data.

Preferably, the specific steps of inputting the initial data C into the knowledge-graph a and processing in S3 are as follows:

s31, converting the initial data C into structured data K;

and S32, performing word segmentation on the structured data K to obtain an entity fragment information group D.

Preferably, after the word segmentation is performed on the structured data K, the information after the word segmentation needs to be filtered.

Preferably, the initial data C in S3 is input into knowledge-graph A, and a knowledge-graph A1 related to the initial data C is obtained.

Preferably, the corpus group G in S7 is input into the knowledge-graph a, resulting in a knowledge-graph a2 associated with the corpus group G.

Preferably, in S6, the corpus information obtained before the corpus group G is obtained needs to be screened, and the specific method is as follows: detecting whether the obtained corpus information contains entities or not, if so, summarizing the corpus information to obtain a corpus group G; and if not, discarding the corpus information without the entity.

The technical scheme of the invention has the following beneficial technical effects:

when the method is used, when related information needs to be obtained from a database, pattern matching processing is firstly carried out on the information needing to be obtained, different information has the same attribute through pattern matching, initial data C with the same attribute is found out, the obtained initial data C is input into a knowledge graph A, and an entity fragment information group D is obtained; clustering the entity fragment information group D, and establishing a first common word network F corresponding to the initial data A;

obtaining a corpus group G comprising an entity fragment information group D from a database B; inputting the corpus group G into a knowledge graph A, clustering the obtained entity information H, and establishing a second common-word network J corresponding to the corpus group G; calculating the similarity between the first common-word network F and the second common-word network J, and determining whether the related information of the corpus G obtained from the database B is the required information; therefore, the data processing efficiency is improved, and the information retrieval time is greatly reduced.

Drawings

Fig. 1 is a flowchart of a method for calculating similarity of entity relationships based on pattern matching according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

As shown in fig. 1, the method for calculating similarity of entity parallel relationship based on pattern matching provided by the present invention comprises the following specific steps:

s1, constructing a knowledge graph A and a database B;

s2, inputting initial data C with successfully matched mode;

In an alternative embodiment, the initial data C includes structured data, unstructured data, and semi-structured data.

In an alternative embodiment, the specific steps of processing the input knowledge-graph a of the initial data C in S3 are as follows:

s31, converting the initial data C into structured data K;

In an optional embodiment, after the word segmentation is performed on the structured data K, the information after the word segmentation needs to be filtered to remove the interfering words and the useless words after the word segmentation of the structured data K.

In an alternative embodiment, the initial data C in S3 is input into the knowledge graph a to obtain a knowledge graph a1 related to the initial data C, and a worker examines the obtained knowledge graph a1 as needed to determine whether the obtained knowledge graph a1 is useful; when knowledge-graph A1 is determined to be useful, knowledge-graph A1 is merged into knowledge-graph A.

In an alternative embodiment, the corpus group G in S7 is input into the knowledge-graph a, resulting in a knowledge-graph a2 associated with the corpus group G; the staff review the obtained knowledge graph A2 as required to determine whether the obtained knowledge graph A2 is useful; when knowledge-graph A2 is determined to be useful, knowledge-graph A2 is merged into knowledge-graph A.

In an alternative embodiment, in S6, the obtained corpus information needs to be screened before the obtained corpus group G, and the specific method includes:

detecting whether the obtained corpus information contains an entity or not,

if the corpus information obtained by detection contains entities, summarizing the corpus information to obtain a corpus group G;

and if the detected corpus information does not comprise the entity, discarding the corpus information without the entity.

In the invention, if related information needs to be obtained from a database, pattern matching processing is firstly carried out on the information needing to be obtained, different information has the same attribute through pattern matching, initial data C with the same attribute is found out, and the obtained initial data C is input into a knowledge graph A to obtain an entity fragment information group D; clustering the entity fragment information group D, and establishing a first common word network F corresponding to the initial data A; obtaining a corpus group G comprising an entity fragment information group D from a database B; inputting the corpus group G into a knowledge graph A, clustering the obtained entity information H, and establishing a second common-word network J corresponding to the corpus group G; calculating the similarity between the first common-word network F and the second common-word network J, and determining whether the related information of the corpus G obtained from the database B is the required information; therefore, the data processing efficiency is improved, and the data retrieval time is greatly reduced.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A method for calculating similarity of entity parallel relation based on pattern matching is characterized by comprising the following specific steps:

s1, constructing a knowledge graph A and a database B;

s2, inputting initial data C with successfully matched mode;

2. The entity parallel relationship similarity calculation method based on pattern matching as claimed in claim 1, wherein the initial data C includes structured data, unstructured data and semi-structured data.

3. The entity parallel relationship similarity calculation method based on pattern matching as claimed in claim 1, wherein the specific steps of inputting the initial data C into the knowledge graph a and processing in S3 are as follows:

s31, converting the initial data C into structured data K;

4. The entity parallel relationship similarity calculation method based on pattern matching as claimed in claim 3, wherein after word segmentation is performed on the structured data K, information after word segmentation is required to be filtered.

5. The method for calculating similarity of entity relationships based on pattern matching as claimed in claim 1, wherein the initial data C in S3 is inputted into the knowledge graph a, resulting in a knowledge graph a1 related to the initial data C.

6. The method as claimed in claim 1, wherein the corpus G in S7 is inputted into the knowledge-graph A to obtain a knowledge-graph A2 associated with the corpus G.

7. The method for calculating similarity of entity relationships based on pattern matching according to claim 1, wherein the corpus information obtained in S6 needs to be screened before the corpus group G is obtained, and the specific method is as follows: detecting whether the obtained corpus information contains an entity or not,