WO2012091539A1 - A semantic similarity matching system and a method thereof - Google Patents

A semantic similarity matching system and a method thereof Download PDF

Info

Publication number
WO2012091539A1
WO2012091539A1 PCT/MY2011/000150 MY2011000150W WO2012091539A1 WO 2012091539 A1 WO2012091539 A1 WO 2012091539A1 MY 2011000150 W MY2011000150 W MY 2011000150W WO 2012091539 A1 WO2012091539 A1 WO 2012091539A1
Authority
WO
WIPO (PCT)
Prior art keywords
conceptual
semantic similarity
graphs
conceptual graphs
similarity matching
Prior art date
Application number
PCT/MY2011/000150
Other languages
French (fr)
Inventor
Dickson Lukose
Mohd Faizul SULAIMAN
Abdul Wahab DAHALAN
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2012091539A1 publication Critical patent/WO2012091539A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • U.S. 6,810,376 B1 describes a system and associated methods to determine the semantic similarity of different sentences to one another. Unfortunately, the prior art requires the sentences to be broken up into words before a similarity calculation may be done between a first and second set of words.
  • U.S. 5331554 describes a method and apparatus using automated semantic pattern recognition where a user may query for information in a text and the described invention displays location of the information.
  • the text must first be converted to a tree-like structure to form a knowledge base.
  • a degree of similarity between a query and a node is measured based on a predetermined threshold value to display the actual location of the node.
  • this method requires a threshold value to be determined ahead of the process. Determination of this threshold value would greatly affect the effectiveness of the method as it would depend on certain criteria to decide on the threshold value.
  • a semantic similarity matching system of a plurality of conceptual graphs includes a semantic similarity matching component which further includes a conceptual graph processor, a database operatively connectable to the conceptual graph processor and a semantic similarity calculator wherein output of the semantic similarity calculator is a similarity index (SI) of the plurality of conceptual graphs.
  • SI similarity index
  • a method of semantic similarity matching of a plurality of conceptual graphs includes the steps of recerving the plurality of conceptual graphs, performing a count of matched concept nodes in the plurality of conceptual graphs, retrieving a total of concept nodes in each conceptual graph and calculating similarity index between the plurality of conceptual graphs.
  • Figure 1 is a block diagram illustrating architecture of a preferred embodiment of a semantic similarity matching system of a plurality of conceptual graphs.
  • the present invention relates to a semantic similarity matching system of a plurality of conceptual graphs and a method thereof.
  • this specification will describe the present invention according to the preferred embodiment of the present invention.
  • limiting the description to the preferred embodiment of the invention is merely to facilitate discussion of the present invention and it is envisioned that those skilled in the art may devise various modifications and equivalents without departing from the scope of the appended claims.
  • the following detailed description of the preferred embodiment will now be described in accordance with the attached drawings, either individually or in combination.
  • the present invention provides a semantic similarity matching system (100) of a plurality of conceptual graphs as seen in Figure 1.
  • the system (100) includes a semantic similarity matching component (1 10).
  • the system (100) further includes a conceptual graph processor (120) and a conceptual graph knowledge base (130) operatively connectable to the conceptual graph processor (120).
  • a semantic similarity calculator (140) is also included in the system (100) wherein output of the semantic similarity calculator (140) is a similarity index (SI) (150) of the plurality of conceptual graphs.
  • the conceptual graph processor (120) functions as a data layer in the system (100).
  • the system (100) matches a plurality of conceptual graphs, such as two conceptual graphs by applying a maximal joint operation method on both conceptual graphs and returns a similarity index (SI) (150) in a range of 0 to 1.
  • SI similarity index
  • a value of 1 is defined as being most identical between the plurality of conceptual graphs and a value of 0 is defined as being completely non-identical.
  • the semantic similarity matching component (1 10) takes two conceptual graphs, such as cg1 and cg2 as seen in Figure 1 and performs a maximal join operation by utilizing the conceptual graph processor (120) and returns a count of matched concept nodes in cg1 and cg2. A total of concept nodes in each conceptual graph are retrieved. The count is then used to calculate a similarity index using the formula:
  • maxJoinSize number of concept which is maximally join in both graphs.
  • cglsize a number of concept nodes in cg1
  • cg2size a number of concept nodes in cg2
  • the semantic similarity matching component (110) is able to accept two conceptual graphs at any given time to conduct matching and return one similarity index (SI).
  • the system (100) uses conceptual graphs (CGs) representation to compare similarities between two CGs.
  • a CG is made up of a combination of concept (C) nodes and relation (R) nodes.
  • C concept
  • R relation
  • An example below shows a representation of 2 conceptual graphs namely cg1 and cg2:
  • CGs express meaning in a form that is logically precise, humanly readable, and computationally tractable. With a direct mapping to language, conceptual graphs serve as an intermediate language for translating computer-oriented formalisms to and from natural languages. With graphic representation, the CGs function as a readable, but formal design and specification language. The described method and system can be applied, but not restricted to, a variety of applications for information retrieval, database design, expert systems, and natural language processing.

Abstract

A semantic similarity matching system (100) of a plurality of conceptual graphs is provided, the system (100) includes a semantic similarity matching component (110) which further includes a conceptual graph processor (120), a database (130) operatively connectable to the conceptual graph processor (120) and a semantic similarity calculator (140) wherein output of the semantic similarity calculator (140) is a similarity index (SI) of the plurality of conceptual graphs.

Description

A SEMANTIC SIMILARITY MATCHING SYSTEM AND A METHOD THEREOF
FIELD OF INVENTION The present invention relates to a semantic similarity matching system of a plurality of conceptual graphs and a method thereof
BACKGROUND OF INVENTION Retrieval of information from a knowledge base is able to satisfy a user when it is relevant to queries submitted by the user. However, solutions for retrieving relevant search information are still not well developed as this field is still in a developmental stage. U.S. 6,810,376 B1 describes a system and associated methods to determine the semantic similarity of different sentences to one another. Unfortunately, the prior art requires the sentences to be broken up into words before a similarity calculation may be done between a first and second set of words. U.S. 5331554 describes a method and apparatus using automated semantic pattern recognition where a user may query for information in a text and the described invention displays location of the information. In order to do this, the text must first be converted to a tree-like structure to form a knowledge base. A degree of similarity between a query and a node is measured based on a predetermined threshold value to display the actual location of the node. However, this method requires a threshold value to be determined ahead of the process. Determination of this threshold value would greatly affect the effectiveness of the method as it would depend on certain criteria to decide on the threshold value.
John F. Sowa. (1984). "Conceptual Structures: Information Processing in Mind and Machine." Addison Wesley describes the Maximal Join Algorithm as when given two graphs that share compatible sub-graphs, the maximal join algorithm will attempt to build a new graph in which the two initial graphs are fused, according to their compatible sub-graph. Therefore, there is a need for an accurate and efficient solution to match similarities between natural language queries and a set of data found from any search methods.
SUMMARY OF INVENTION Accordingly there is provided a semantic similarity matching system of a plurality of conceptual graphs, the system includes a semantic similarity matching component which further includes a conceptual graph processor, a database operatively connectable to the conceptual graph processor and a semantic similarity calculator wherein output of the semantic similarity calculator is a similarity index (SI) of the plurality of conceptual graphs.
There is also provided a method of semantic similarity matching of a plurality of conceptual graphs, the method includes the steps of recerving the plurality of conceptual graphs, performing a count of matched concept nodes in the plurality of conceptual graphs, retrieving a total of concept nodes in each conceptual graph and calculating similarity index between the plurality of conceptual graphs. The present invention consists of several novel features and a combination of parts hereinafter fully described and illustrated in the accompanying description and drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be fully understood from the detailed description given herein below and the accompanying drawings which are given by way of illustration only, and thus are not limitative of the present invention, wherein:
Figure 1 is a block diagram illustrating architecture of a preferred embodiment of a semantic similarity matching system of a plurality of conceptual graphs.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The present invention relates to a semantic similarity matching system of a plurality of conceptual graphs and a method thereof. Hereinafter, this specification will describe the present invention according to the preferred embodiment of the present invention. However, it is to be understood that limiting the description to the preferred embodiment of the invention is merely to facilitate discussion of the present invention and it is envisioned that those skilled in the art may devise various modifications and equivalents without departing from the scope of the appended claims. The following detailed description of the preferred embodiment will now be described in accordance with the attached drawings, either individually or in combination.
The present invention provides a semantic similarity matching system (100) of a plurality of conceptual graphs as seen in Figure 1. The system (100) includes a semantic similarity matching component (1 10). The system (100) further includes a conceptual graph processor (120) and a conceptual graph knowledge base (130) operatively connectable to the conceptual graph processor (120). A semantic similarity calculator (140) is also included in the system (100) wherein output of the semantic similarity calculator (140) is a similarity index (SI) (150) of the plurality of conceptual graphs. The conceptual graph processor (120) functions as a data layer in the system (100).
The system (100) matches a plurality of conceptual graphs, such as two conceptual graphs by applying a maximal joint operation method on both conceptual graphs and returns a similarity index (SI) (150) in a range of 0 to 1. A value of 1 is defined as being most identical between the plurality of conceptual graphs and a value of 0 is defined as being completely non-identical. For example, the semantic similarity matching component (1 10) takes two conceptual graphs, such as cg1 and cg2 as seen in Figure 1 and performs a maximal join operation by utilizing the conceptual graph processor (120) and returns a count of matched concept nodes in cg1 and cg2. A total of concept nodes in each conceptual graph are retrieved. The count is then used to calculate a similarity index using the formula:
Similarity index = (maxJoinSize) / (cglsize + cg2size - maxJoinSize) Where;
maxJoinSize = number of concept which is maximally join in both graphs.
cglsize = a number of concept nodes in cg1
cg2size = a number of concept nodes in cg2
In this embodiment, the semantic similarity matching component (110) is able to accept two conceptual graphs at any given time to conduct matching and return one similarity index (SI).
The system (100) uses conceptual graphs (CGs) representation to compare similarities between two CGs. A CG is made up of a combination of concept (C) nodes and relation (R) nodes. An example below shows a representation of 2 conceptual graphs namely cg1 and cg2:
Figure imgf000007_0001
(Rl) * [CI]
CGs express meaning in a form that is logically precise, humanly readable, and computationally tractable. With a direct mapping to language, conceptual graphs serve as an intermediate language for translating computer-oriented formalisms to and from natural languages. With graphic representation, the CGs function as a readable, but formal design and specification language. The described method and system can be applied, but not restricted to, a variety of applications for information retrieval, database design, expert systems, and natural language processing.

Claims

1. A semantic similarity matching system (100) of a plurality of conceptual graphs, the system (100) includes:
a semantic similarity matching component (110) which further includes a conceptual graph processor (120);
a database (130) operatively connectable to the conceptual graph processor (120); and
a semantic similarity calculator (140) wherein output of the semantic similarity calculator (140) is a similarity index (SI) of the plurality of conceptual graphs.
2. The system (100) as claimed in claim 1 , wherein the plurality of conceptual graphs are two conceptual graphs.
3. A method of semantic similarity matching of a plurality of conceptual graphs, the method includes the steps of:
i. receiving the plurality of conceptual graphs;
ii. performing a count of matched concept nodes in the plurality of conceptual graphs;
iii. retrieving a total of concept nodes in each conceptual graph; and iv. indexing similarity on concept nodes in the plurality of conceptual graphs.
PCT/MY2011/000150 2010-12-28 2011-06-24 A semantic similarity matching system and a method thereof WO2012091539A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2010006269 2010-12-28
MYPI2010006269 MY151371A (en) 2010-12-28 2010-12-28 A semantic similarity matching system and a method thereof

Publications (1)

Publication Number Publication Date
WO2012091539A1 true WO2012091539A1 (en) 2012-07-05

Family

ID=46383349

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2011/000150 WO2012091539A1 (en) 2010-12-28 2011-06-24 A semantic similarity matching system and a method thereof

Country Status (2)

Country Link
MY (1) MY151371A (en)
WO (1) WO2012091539A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015178758A1 (en) * 2014-05-19 2015-11-26 Mimos Berhad A system and method for analyzing concept evolution using network analysis
CN105893671A (en) * 2016-03-30 2016-08-24 浙江大学 Complex mechanical and electrical product system design model verification method based on expansion concept map
CN105900081A (en) * 2013-02-19 2016-08-24 谷歌公司 Natural language processing based search
CN106610934A (en) * 2016-07-08 2017-05-03 四川用联信息技术有限公司 Novel semantic similarity solving method in intelligent manufacturing industry

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHONG, J. ET AL.: "Conceptual Graph Matching for Semantic Search", LECTURE NOTES IN COMPUTER SCIENCE, vol. 2393, 2002, pages 92 - 106, XP002355172 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105900081A (en) * 2013-02-19 2016-08-24 谷歌公司 Natural language processing based search
CN105900081B (en) * 2013-02-19 2020-09-08 谷歌有限责任公司 Search based on natural language processing
WO2015178758A1 (en) * 2014-05-19 2015-11-26 Mimos Berhad A system and method for analyzing concept evolution using network analysis
CN105893671A (en) * 2016-03-30 2016-08-24 浙江大学 Complex mechanical and electrical product system design model verification method based on expansion concept map
CN106610934A (en) * 2016-07-08 2017-05-03 四川用联信息技术有限公司 Novel semantic similarity solving method in intelligent manufacturing industry

Also Published As

Publication number Publication date
MY151371A (en) 2014-05-30

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN108182972B (en) Intelligent coding method and system for Chinese disease diagnosis based on word segmentation network
KR102407510B1 (en) Method, apparatus, device and medium for storing and querying data
CN105095204B (en) The acquisition methods and device of synonym
CN108182207B (en) Intelligent coding method and system for Chinese surgical operation based on word segmentation network
US8761512B1 (en) Query by image
CN108154198B (en) Knowledge base entity normalization method, system, terminal and computer readable storage medium
CN105224648A (en) A kind of entity link method and system
JP2020500371A (en) Apparatus and method for semantic search
CN105659225A (en) Query expansion and query-document matching using path-constrained random walks
CN102402561B (en) Searching method and device
CN104199965A (en) Semantic information retrieval method
CN110569328A (en) Entity linking method, electronic device and computer equipment
CN104112005B (en) Distributed mass fingerprint identification method
CN103218373A (en) System, method and device for relevant searching
CN103678336A (en) Method and device for identifying entity words
CN105677725A (en) Preset parsing method for tourism vertical search engine
CN111026877A (en) Knowledge verification model construction and analysis method based on probability soft logic
Gross et al. How do Computed Ontology Mappings Evolve?-A Case Study for Life Science Ontologies.
WO2012091539A1 (en) A semantic similarity matching system and a method thereof
CN109872775A (en) A kind of document mask method, device, equipment and computer-readable medium
US7734633B2 (en) Listwise ranking
CN108287850B (en) Text classification model optimization method and device
CN102314464B (en) Lyrics searching method and lyrics searching engine
CN102915381B (en) Visual network retrieval based on multi-dimensional semantic presents system and presents control method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11853675

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11853675

Country of ref document: EP

Kind code of ref document: A1