CN110781309A - Entity parallel relation similarity calculation method based on pattern matching - Google Patents

Entity parallel relation similarity calculation method based on pattern matching Download PDF

Info

Publication number
CN110781309A
CN110781309A CN201910583113.1A CN201910583113A CN110781309A CN 110781309 A CN110781309 A CN 110781309A CN 201910583113 A CN201910583113 A CN 201910583113A CN 110781309 A CN110781309 A CN 110781309A
Authority
CN
China
Prior art keywords
entity
information
corpus
group
initial data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910583113.1A
Other languages
Chinese (zh)
Inventor
刘家祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central Mdt Infotech Ltd Of United States Of Xiamen
Original Assignee
Central Mdt Infotech Ltd Of United States Of Xiamen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central Mdt Infotech Ltd Of United States Of Xiamen filed Critical Central Mdt Infotech Ltd Of United States Of Xiamen
Priority to CN201910583113.1A priority Critical patent/CN110781309A/en
Publication of CN110781309A publication Critical patent/CN110781309A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Animal Behavior & Ethology (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for calculating similarity of entity parallel relation based on pattern matching comprises the following steps: constructing a knowledge graph A and a database B; inputting initial data C with successfully matched mode; inputting the initial data C into a knowledge graph A, clustering the obtained entity fragment information group D, and establishing a first common word network F according to the obtained first clustered entity information E; searching the corpus comprising the entity fragment information group D in a database, and inputting the obtained corpus group G into a knowledge graph A to obtain entity information H contained in the corpus group G; clustering the obtained entity information H, and establishing a second common-word network J corresponding to the corpus group G according to the obtained second clustered entity information I; and calculating the similarity between the first co-word network F and the second co-word network J. The invention can shorten the retrieval time of the user for acquiring the information required by the user from the database, thereby improving the working efficiency.

Description

Entity parallel relation similarity calculation method based on pattern matching
Technical Field
The invention relates to the technical field of data processing, in particular to a method for calculating similarity of entity parallel relation based on pattern matching.
Background
The calculation of entity similarity has many applications, and a typical application scenario of the similarity model is to find other entities similar to a certain entity. With the development of information network technology, information on a network grows exponentially, and when information of related subjects needs to be counted, as data information counted on the network cannot be estimated, a lot of human resources are inevitably wasted only by depending on human management, and a lot of time is consumed to obtain the needed related information, so that deviation often occurs; therefore, the application provides an entity parallel relation similarity calculation method based on pattern matching.
Disclosure of Invention
Objects of the invention
In order to solve the technical problems in the background art, the invention provides a method for calculating the similarity of entity parallel relations based on pattern matching.
(II) technical scheme
In order to solve the problems, the invention provides an entity parallel relation similarity calculation method based on pattern matching, which comprises the following specific steps:
s1, constructing a knowledge graph A and a database B;
s2, inputting initial data C with successfully matched mode;
s3, inputting the initial data C into the knowledge graph A to obtain an entity fragment information group D;
s4, clustering the obtained entity fragment information group D to obtain first clustering entity information E;
s5, establishing a first common word network F corresponding to the initial data A according to the first clustering entity information E;
s6, searching the corpus comprising the entity fragment information group D in the database to obtain a corpus group G;
s7, inputting the corpus group G into the knowledge graph A to obtain entity information H included in the corpus group G;
s8, clustering the obtained entity information H to obtain second clustering entity information I;
s9, establishing a second common-word network J corresponding to the corpus G according to the second clustering entity information I;
s10, calculating the similarity between the first co-word network F and the second co-word network J.
Preferably, the initial data C includes structured data, unstructured data and semi-structured data.
Preferably, the specific steps of inputting the initial data C into the knowledge-graph a and processing in S3 are as follows:
s31, converting the initial data C into structured data K;
and S32, performing word segmentation on the structured data K to obtain an entity fragment information group D.
Preferably, after the word segmentation is performed on the structured data K, the information after the word segmentation needs to be filtered.
Preferably, the initial data C in S3 is input into knowledge-graph A, and a knowledge-graph A1 related to the initial data C is obtained.
Preferably, the corpus group G in S7 is input into the knowledge-graph a, resulting in a knowledge-graph a2 associated with the corpus group G.
Preferably, in S6, the corpus information obtained before the corpus group G is obtained needs to be screened, and the specific method is as follows: detecting whether the obtained corpus information contains entities or not, if so, summarizing the corpus information to obtain a corpus group G; and if not, discarding the corpus information without the entity.
The technical scheme of the invention has the following beneficial technical effects:
when the method is used, when related information needs to be obtained from a database, pattern matching processing is firstly carried out on the information needing to be obtained, different information has the same attribute through pattern matching, initial data C with the same attribute is found out, the obtained initial data C is input into a knowledge graph A, and an entity fragment information group D is obtained; clustering the entity fragment information group D, and establishing a first common word network F corresponding to the initial data A;
obtaining a corpus group G comprising an entity fragment information group D from a database B; inputting the corpus group G into a knowledge graph A, clustering the obtained entity information H, and establishing a second common-word network J corresponding to the corpus group G; calculating the similarity between the first common-word network F and the second common-word network J, and determining whether the related information of the corpus G obtained from the database B is the required information; therefore, the data processing efficiency is improved, and the information retrieval time is greatly reduced.
Drawings
Fig. 1 is a flowchart of a method for calculating similarity of entity relationships based on pattern matching according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
As shown in fig. 1, the method for calculating similarity of entity parallel relationship based on pattern matching provided by the present invention comprises the following specific steps:
s1, constructing a knowledge graph A and a database B;
s2, inputting initial data C with successfully matched mode;
s3, inputting the initial data C into the knowledge graph A to obtain an entity fragment information group D;
s4, clustering the obtained entity fragment information group D to obtain first clustering entity information E;
s5, establishing a first common word network F corresponding to the initial data A according to the first clustering entity information E;
s6, searching the corpus comprising the entity fragment information group D in the database to obtain a corpus group G;
s7, inputting the corpus group G into the knowledge graph A to obtain entity information H included in the corpus group G;
s8, clustering the obtained entity information H to obtain second clustering entity information I;
s9, establishing a second common-word network J corresponding to the corpus G according to the second clustering entity information I;
s10, calculating the similarity between the first co-word network F and the second co-word network J.
In an alternative embodiment, the initial data C includes structured data, unstructured data, and semi-structured data.
In an alternative embodiment, the specific steps of processing the input knowledge-graph a of the initial data C in S3 are as follows:
s31, converting the initial data C into structured data K;
and S32, performing word segmentation on the structured data K to obtain an entity fragment information group D.
In an optional embodiment, after the word segmentation is performed on the structured data K, the information after the word segmentation needs to be filtered to remove the interfering words and the useless words after the word segmentation of the structured data K.
In an alternative embodiment, the initial data C in S3 is input into the knowledge graph a to obtain a knowledge graph a1 related to the initial data C, and a worker examines the obtained knowledge graph a1 as needed to determine whether the obtained knowledge graph a1 is useful; when knowledge-graph A1 is determined to be useful, knowledge-graph A1 is merged into knowledge-graph A.
In an alternative embodiment, the corpus group G in S7 is input into the knowledge-graph a, resulting in a knowledge-graph a2 associated with the corpus group G; the staff review the obtained knowledge graph A2 as required to determine whether the obtained knowledge graph A2 is useful; when knowledge-graph A2 is determined to be useful, knowledge-graph A2 is merged into knowledge-graph A.
In an alternative embodiment, in S6, the obtained corpus information needs to be screened before the obtained corpus group G, and the specific method includes:
detecting whether the obtained corpus information contains an entity or not,
if the corpus information obtained by detection contains entities, summarizing the corpus information to obtain a corpus group G;
and if the detected corpus information does not comprise the entity, discarding the corpus information without the entity.
In the invention, if related information needs to be obtained from a database, pattern matching processing is firstly carried out on the information needing to be obtained, different information has the same attribute through pattern matching, initial data C with the same attribute is found out, and the obtained initial data C is input into a knowledge graph A to obtain an entity fragment information group D; clustering the entity fragment information group D, and establishing a first common word network F corresponding to the initial data A; obtaining a corpus group G comprising an entity fragment information group D from a database B; inputting the corpus group G into a knowledge graph A, clustering the obtained entity information H, and establishing a second common-word network J corresponding to the corpus group G; calculating the similarity between the first common-word network F and the second common-word network J, and determining whether the related information of the corpus G obtained from the database B is the required information; therefore, the data processing efficiency is improved, and the data retrieval time is greatly reduced.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (7)

1. A method for calculating similarity of entity parallel relation based on pattern matching is characterized by comprising the following specific steps:
s1, constructing a knowledge graph A and a database B;
s2, inputting initial data C with successfully matched mode;
s3, inputting the initial data C into the knowledge graph A to obtain an entity fragment information group D;
s4, clustering the obtained entity fragment information group D to obtain first clustering entity information E;
s5, establishing a first common word network F corresponding to the initial data A according to the first clustering entity information E;
s6, searching the corpus comprising the entity fragment information group D in the database to obtain a corpus group G;
s7, inputting the corpus group G into the knowledge graph A to obtain entity information H included in the corpus group G;
s8, clustering the obtained entity information H to obtain second clustering entity information I;
s9, establishing a second common-word network J corresponding to the corpus G according to the second clustering entity information I;
s10, calculating the similarity between the first co-word network F and the second co-word network J.
2. The entity parallel relationship similarity calculation method based on pattern matching as claimed in claim 1, wherein the initial data C includes structured data, unstructured data and semi-structured data.
3. The entity parallel relationship similarity calculation method based on pattern matching as claimed in claim 1, wherein the specific steps of inputting the initial data C into the knowledge graph a and processing in S3 are as follows:
s31, converting the initial data C into structured data K;
and S32, performing word segmentation on the structured data K to obtain an entity fragment information group D.
4. The entity parallel relationship similarity calculation method based on pattern matching as claimed in claim 3, wherein after word segmentation is performed on the structured data K, information after word segmentation is required to be filtered.
5. The method for calculating similarity of entity relationships based on pattern matching as claimed in claim 1, wherein the initial data C in S3 is inputted into the knowledge graph a, resulting in a knowledge graph a1 related to the initial data C.
6. The method as claimed in claim 1, wherein the corpus G in S7 is inputted into the knowledge-graph A to obtain a knowledge-graph A2 associated with the corpus G.
7. The method for calculating similarity of entity relationships based on pattern matching according to claim 1, wherein the corpus information obtained in S6 needs to be screened before the corpus group G is obtained, and the specific method is as follows: detecting whether the obtained corpus information contains an entity or not,
if the corpus information obtained by detection contains entities, summarizing the corpus information to obtain a corpus group G;
and if the detected corpus information does not comprise the entity, discarding the corpus information without the entity.
CN201910583113.1A 2019-07-01 2019-07-01 Entity parallel relation similarity calculation method based on pattern matching Pending CN110781309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910583113.1A CN110781309A (en) 2019-07-01 2019-07-01 Entity parallel relation similarity calculation method based on pattern matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910583113.1A CN110781309A (en) 2019-07-01 2019-07-01 Entity parallel relation similarity calculation method based on pattern matching

Publications (1)

Publication Number Publication Date
CN110781309A true CN110781309A (en) 2020-02-11

Family

ID=69383870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910583113.1A Pending CN110781309A (en) 2019-07-01 2019-07-01 Entity parallel relation similarity calculation method based on pattern matching

Country Status (1)

Country Link
CN (1) CN110781309A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767410A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Construction method, device, equipment and storage medium of clinical medical knowledge map

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198103A (en) * 2013-03-20 2013-07-10 微梦创科网络科技(中国)有限公司 Microblog pushing method and device based on dense word clustering
CN103365912A (en) * 2012-04-06 2013-10-23 富士通株式会社 Method and device for clustering and extracting entity relationship modes
CN104866593A (en) * 2015-05-29 2015-08-26 中国电子科技集团公司第二十八研究所 Database searching method based on knowledge graph
CN107944559A (en) * 2017-11-24 2018-04-20 国家计算机网络与信息安全管理中心 A kind of entity relationship automatic identifying method and system
CN107992480A (en) * 2017-12-25 2018-05-04 东软集团股份有限公司 A kind of method, apparatus for realizing entity disambiguation and storage medium, program product

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365912A (en) * 2012-04-06 2013-10-23 富士通株式会社 Method and device for clustering and extracting entity relationship modes
CN103198103A (en) * 2013-03-20 2013-07-10 微梦创科网络科技(中国)有限公司 Microblog pushing method and device based on dense word clustering
CN104866593A (en) * 2015-05-29 2015-08-26 中国电子科技集团公司第二十八研究所 Database searching method based on knowledge graph
CN107944559A (en) * 2017-11-24 2018-04-20 国家计算机网络与信息安全管理中心 A kind of entity relationship automatic identifying method and system
CN107992480A (en) * 2017-12-25 2018-05-04 东软集团股份有限公司 A kind of method, apparatus for realizing entity disambiguation and storage medium, program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767410A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Construction method, device, equipment and storage medium of clinical medical knowledge map

Similar Documents

Publication Publication Date Title
US11977541B2 (en) Systems and methods for rapid data analysis
CN106033416B (en) Character string processing method and device
TW202029079A (en) Method and device for identifying irregular group
CN111522968B (en) Knowledge graph fusion method and device
CN110688549B (en) Artificial intelligence classification method and system based on knowledge system map construction
WO2019001429A1 (en) Multisource data fusion method and apparatus
US20170109358A1 (en) Method and system of determining enterprise content specific taxonomies and surrogate tags
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN112084761B (en) Hydraulic engineering information management method and device
US10250550B2 (en) Social message monitoring method and apparatus
CN110825817B (en) Enterprise suspected association judgment method and system
CN104965846B (en) Visual human's method for building up in MapReduce platform
CN110781309A (en) Entity parallel relation similarity calculation method based on pattern matching
CN112363996A (en) Method, system, and medium for building a physical model of a power grid knowledge graph
CN115329748B (en) Log analysis method, device, equipment and storage medium
CN105573984B (en) The recognition methods of socio-economic indicator and device
CN115048352B (en) Log field extraction method, device, equipment and storage medium
CN116303379A (en) Data processing method, system and computer storage medium
KR20130109601A (en) Decision method of ontology instance similarity and ontology system using the method
US20150324813A1 (en) System and method for determining by an external entity the human hierarchial structure of an rganization, using public social networks
CN111597212B (en) Data retrieval method and device
CN117131245B (en) Method for realizing directory resource recommendation mechanism by using knowledge graph technology
JP2013246478A (en) Determination device, learning method and learning program
US12014169B2 (en) Software recognition using tree-structured pattern matching rules for software asset management
CN116975774A (en) Mechanism name fusion method, terminal equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200211