CN107451123B - Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources - Google Patents

Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources Download PDF

Info

Publication number
CN107451123B
CN107451123B CN201710707420.7A CN201710707420A CN107451123B CN 107451123 B CN107451123 B CN 107451123B CN 201710707420 A CN201710707420 A CN 201710707420A CN 107451123 B CN107451123 B CN 107451123B
Authority
CN
China
Prior art keywords
words
word
relation
antisense
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710707420.7A
Other languages
Chinese (zh)
Other versions
CN107451123A (en
Inventor
鹿文鹏
孟凡擎
杜月寒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Southern Power Grid Internet Service Co ltd
Jingchuang United (Beijing) Intellectual Property Service Co.,Ltd.
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201710707420.7A priority Critical patent/CN107451123B/en
Publication of CN107451123A publication Critical patent/CN107451123A/en
Application granted granted Critical
Publication of CN107451123B publication Critical patent/CN107451123B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources. The method comprises the following steps: acquiring an antisense word set by combining various Chinese knowledge resources, and judging whether the semantic relation between the words has an antisense relation or not according to the antisense word set; extracting a partial word set by using various Chinese knowledge resources, and judging whether integral partial relations exist among words or not according to the partial word set; extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among the words or not based on the synonym set; extracting a hyponym set by means of various Chinese knowledge resources, and judging whether the words have a superior-subordinate relationship or not according to the hyponym set. The invention can fully play the role of various Chinese knowledge resources and more accurately and effectively identify the semantic relation of Chinese words.

Description

Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources
Technical Field
The invention relates to the technical field of natural language processing, in particular to a Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources.
Background
Semantic relationship recognition refers to the automatic determination of semantic relationships that a given word pair has between words. Typical semantic relationships include: antisense relationship, whole part relationship, synonymous relationship, superior-inferior relationship, etc. Semantic relation recognition is a fundamental task in the field of natural language processing, and has direct influence on word sense disambiguation, knowledge ontology construction, machine translation, information retrieval, text classification and the like.
Most of current semantic relation recognition research works are mainly aiming at English, and generally based on one or more knowledge resources, classification or recognition tasks of English semantic relations are completed by using statistical learning methods such as a support vector machine and a Bayesian classifier, so that a good effect is achieved. The research work in the aspect of Chinese semantic relation recognition is relatively less, and most of related work usually adopts a certain knowledge resource to recognize the semantic relation by means of a statistical learning method. The existing research work only adopts a certain knowledge resource, and ignores the mining and utilization of other language knowledge resources; the statistical learning method is difficult to avoid the restriction of the scale of the labeled corpus, and the accuracy rate is difficult to ensure. Along with the construction and the improvement of various language knowledge resources, the resources are mutually supplemented, and more reliable knowledge is provided for the identification of semantic relations.
In the face of the technical problems existing in the Chinese word semantic relation recognition, the invention fully excavates the internal semantic relation of a plurality of knowledge resources, realizes a Chinese word semantic relation recognition method and a Chinese word semantic relation recognition device based on a plurality of Chinese knowledge resources, and strives to promote the solution of the problems to a certain extent.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention discloses a Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources, so as to more accurately and effectively judge the semantic relation among Chinese words.
Therefore, the invention provides the following technical scheme:
a Chinese word semantic relation recognition method based on multiple Chinese knowledge resources comprises the following steps:
acquiring an antisense word set by combining various Chinese knowledge resources, and judging whether the semantic relation between words has an antisense relation or not according to the antisense word set;
extracting a partial word set by using various Chinese knowledge resources, and judging whether integral partial relations exist among the words or not according to the partial word set;
thirdly, extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among the words or not based on the synonym set;
extracting a hyponym set by means of multiple Chinese knowledge resources, and judging whether the words have superior-subordinate relations or not according to the hyponym set;
further, in the step one, when determining the antisense semantic relationship, specifically:
step 1-1) performing an antisense word set ASET of a word A for given words A and B by using an explicitly defined antisense relation in HowNetAExtracting if B is equal to ASETAIf so, the two words have antisense relation, otherwise, the step 1-2) is switched to, and in addition, the sense relation defined in HowNet is also treated as an antisense relation;
step 1-2) extraction of antisense word set ASET of given word A using Baidu ChineseAUsing the same meaning of HagongdaSynonym set SSET for extracting words A by word forest expansion editionAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to the step 1-3);
step 1-3) extracting the antisense word set ASET of the word A by using Baidu encyclopediaAIf the word B belongs to ASETAIf the two words have antisense relation, otherwise, go to step 2-1).
Further, in the second step, when determining the integral part relationship, specifically:
step 2-1) extracting partial word sets MSET of words A and B respectively by using HowNetAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf so, the two words have integral part relation, otherwise, the step 2-2) is carried out;
step 2-2) processing using HowNet semantic definition, in which a word containing the semantic "part" indicates that the word is a partial word (part) of a word, and the value of the "while" attribute in the definition indicates the semantic definition of its whole word, from which the semantic definition set DEFSET of words A and B is extractedAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf so, the words A and B have an integral part relationship, otherwise, turning to the step 3-1);
in addition, some words can be processed in a generalization mode for the whole part relation which can not be effectively recognized by directly utilizing definition of definition, and the value of the attribute of 'window' in the above is generalized to be the upper concept, and the rest operations are unchanged.
Further, in the third step, when determining the synonymous relationship, specifically:
step 3-1) representing synonyms according to the row marked with ═ in the expansion edition of the Harmony large synonym forest, and acquiring the synonym set SSET of the word AAIf B ∈ SSETAThen the words A andb, synonymy relation exists, otherwise, step 3-2) is carried out;
step 3-2) extracting synonym set SSET of the word A by utilizing HowNetAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-3) is carried out;
step 3-3) extracting synonym set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-4) is switched;
step 3-4) acquiring encyclopedia link page sets PSET of the words A and B respectively according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure BDA0001381833890000021
Then words a and B have a synonymy relationship, otherwise go to step 4-1).
Further, in the fourth step, when determining the upper-lower relationship, the following steps are specifically performed:
step 4-1) extracting hyponym sets HSET of words A and B respectively by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have an upper-lower relation, otherwise, turning to the step 4-2);
step 4-2) extracting definition sets DEFSET of words A and B respectively according to the upper and lower relations implied by HowNet definitionAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure BDA0001381833890000031
or
Figure BDA0001381833890000032
Then words a and B have an up-down relationship.
A Chinese word semantic relation recognition device based on multiple Chinese knowledge resources comprises the following steps:
the antisense relation identification unit is used for acquiring an antisense word set by using various Chinese knowledge resources and judging whether the semantic relation among the words has an antisense relation or not according to the antisense word set;
the integral part relation recognition unit is used for extracting a part word set by using various Chinese knowledge resources and judging whether integral part relations exist among the words or not according to the part word set;
the synonymy relation identification unit is used for extracting a synonymy set by utilizing various Chinese knowledge resources and judging whether synonymy relations exist among the words or not based on the synonymy set;
and the superior-inferior relation identification unit is used for extracting a subordinate word set by means of various Chinese knowledge resources and judging whether superior-inferior relations exist among the words or not according to the subordinate word set.
Further, the antisense relation identification unit further comprises:
a HowNet antisense relation recognition unit for performing an ASET set of words A on given words A and B by using the antisense relation explicitly defined in HowNetAExtracting if B is equal to ASETAIf the two words have antisense relation, otherwise, the Chinese antisense relation recognition unit is turned to Baidu, and in addition, the sense relation defined in HowNet is also used as an antisense relation;
baidu Chinese antisense relation recognition unit for extracting set of antisense words ASET of given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to an encyclopedia antisense relation recognition unit;
an Baidu encyclopedia antisense relation recognition unit for extracting the antisense word set ASET of the word A by using Baidu encyclopediaAIf the word B belongs to ASETAIf the two words have antisense relation, otherwise, the two words are converted into the whole part relation identification unit.
Further, the whole part relation identifying unit further includes:
a HowNet integral part relation recognition unit for respectively extracting parts of words A and B by using HowNetWord segmentation set MSETAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf the two words have the integral part relationship, otherwise, the original definition integral part relationship identification unit is defined;
definition integral part relation recognition unit for processing using HowNet definition, wherein a word containing definition 'part' indicates the word as a part word (part) of a word, and the value of 'w hole' attribute in the definition indicates the definition of the whole word, and thereby the definition set DEFSET of words A and B is extractedAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf the words A and B have an integral partial relationship, otherwise, the words A and B are converted into a synonymy relationship identification unit;
in addition, in the definition-based whole part relationship recognition unit, some words may be processed in a generalization manner so that the whole part relationship cannot be effectively recognized by directly using the definition, and the value of the "whole" attribute in the above description is generalized to the upper concept thereof, and the rest of the operations are unchanged.
Further, the synonymy relationship identification unit further includes:
a synonym relation identification unit of the word forest, which is used for obtaining the synonym set SSET of the word A according to the line marked with ═ in the expansion version of the large synonym forest of the HaughAIf B ∈ SSETAIf yes, the words A and B have a synonymy relationship, otherwise, the HowNet synonymy relationship recognition unit is switched;
a HowNet synonymy relation recognition unit for extracting the synonymy set SSET of the word A by utilizing HowNetAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a hundred-degree Chinese synonymy relationship identification unit;
a Baidu Chinese synonymy relation recognition unit for extracting the synonymy set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAThen, thenThe words A and B have a synonymy relationship, otherwise, the words A and B are converted into a Baidu encyclopedia synonymy relationship identification unit;
an encyclopedia synonymy relationship identification unit used for respectively acquiring encyclopedia link page sets PSET of the words A and B according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure BDA0001381833890000043
The words A and B have a synonymy relationship, otherwise, the upper and lower relationship identification units are switched.
Further, the context identification unit further includes:
a lower-level relation recognition unit of HowNet for respectively extracting lower-level word sets HSET of words A and B by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have upper and lower relations, otherwise, the words A and B are transferred to the original definition of the upper and lower relation identification unit;
a definition upper and lower relation identification unit for respectively extracting definition sets DEFSET of the words A and B according to the upper and lower relations implied by the HowNet definitionAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure BDA0001381833890000041
or
Figure BDA0001381833890000042
Then words a and B have an up-down relationship.
The invention has the beneficial effects that:
1. the invention utilizes various Chinese knowledge resources to identify the semantic relation of the words, and fully utilizes each knowledge resource.
2. In the whole part relation recognition operation, aiming at the characteristics of the definition of the sememe of HowNet, the method is supplemented by a generalization method, so that the adaptability of the recognition method is improved.
3. In the process of identifying the upper and lower relation, the invention fully excavates the information contained in the definition in HowNet, and effectively improves the accuracy of identification.
4. The Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources can automatically recognize semantic relations of a given word pair, including antisense relations, integral part relations, synonymy relations and superior and inferior relations, and have high recognition accuracy.
Drawings
FIG. 1 is a flow chart of a method for Chinese word semantic relationship recognition based on multiple Chinese knowledge resources according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a Chinese word semantic relation recognition apparatus based on various Chinese knowledge resources according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the structure of an antisense relation recognition unit according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an overall part relationship identification unit according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a synonym relationship identification unit according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a context identification unit according to an embodiment of the present invention.
The specific implementation mode is as follows:
in order to make the technical field better understand the scheme of the embodiment of the invention, the following detailed description is provided for the embodiment of the invention with reference to the accompanying drawings and implementation modes.
The semantic recognition process is exemplified for a word pair consisting of the word a "motor vehicle" and the word B "truck".
The flow chart of the Chinese word semantic relation recognition method based on various Chinese knowledge resources in the embodiment of the invention is shown in FIG. 1, and comprises the following steps:
step 101, antisense relation identification.
Acquiring an antisense word set by combining various Chinese knowledge resources, and judging whether the semantic relation among the words has an antisense relation according to the antisense word set, wherein the method specifically comprises the following steps:
step 1-1) performing an antisense word set ASET of a word A for given words A and B by using an explicitly defined antisense relation in HowNetAExtracting if B is equal to ASETAIf so, the two words have antisense relation, otherwise, the step 1-2) is switched to, and in addition, the sense relation defined in HowNet is also treated as an antisense relation;
extracting the antisense words (including the para-meaning words) of the word A 'motor vehicle' from HowNet to obtain ASETAThe term "trailer", "cart", "dongfu car", "wheelbarrow", "rickshaw", "yellow croaker", "skeleton car", "rubber car", "bicycle", "horse car", "donkey car", "cow car", "volleyball", "flatbed car", "flatbed tricycle", "rickshaw", "tricycle", "mountain bike", "handcart", "animal car", "cart", "trolley", "ocean car", "moped", "bicycle", "a light horse cart", "chariot", "hub", "halter strap" }, obviously the term "truck" B "
Figure BDA00013818338900000611
So step 1-2) is performed.
Step 1-2) extraction of antisense word set ASET of given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to the step 1-3);
extracting the antisense words of the words A 'motor vehicles' from Baidu Chinese
Figure BDA0001381833890000062
Due to the word B 'truck'
Figure BDA0001381833890000065
Figure BDA0001381833890000066
So step 1-3) is performed.
Step 1-3) extracting the antisense word set ASET of the word A by using Baidu encyclopediaAIf the word B belongs to ASETAIf the two words have antisense relation, otherwise, go to step 2-1).
The antisense words of the word A 'motor vehicle' are extracted from Baidu encyclopedia
Figure BDA0001381833890000064
Due to the word B 'truck'
Figure BDA0001381833890000067
Figure BDA0001381833890000068
So go to step 2-1).
And 102, identifying the relationship of the whole part.
Extracting partial word sets by using various Chinese knowledge resources, and judging whether integral partial relations exist among words or not according to the partial word sets, wherein the method specifically comprises the following steps:
step 2-1) extracting partial word sets MSET of words A and B respectively by using HowNetAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf so, the two words have integral part relation, otherwise, the step 2-2) is carried out;
extracting partial word sets of words A 'motor vehicle' and B 'truck' from HowNet to obtain MSETA{ "headlight", "steering wheel", "sun visor", "trunk", "rear window", "tailgate", "rear lamp", "rear mirror", "cab", "sidecar", "straddle bucket", "automobile engine", "automobile horn", "automobile accessory", "cylinder", "headlight", "fuel gauge", "tail lamp", "trunk", "throttle", "sun visor roof" },
Figure BDA0001381833890000061
"truck" for the reason of B "
Figure BDA0001381833890000069
A 'motor vehicle'
Figure BDA00013818338900000610
So go to step 2-2).
Step 2-2) processing using HowNet semantic definition, in which a word containing the semantic "part" indicates that the word is a partial word (part) of a word, and the value of the "while" attribute in the definition indicates the semantic definition of its whole word, from which the semantic definition set DEFSET of words A and B is extractedAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf so, the words A and B have an integral part relationship, otherwise, turning to the step 3-1);
in addition, some words can be processed in a generalization mode for the whole part relation which can not be effectively recognized by directly utilizing definition of definition, and the value of the attribute of 'window' in the above is generalized to be the upper concept, and the rest operations are unchanged.
Using HowNet to extract DEFSET sets of definitions for words A "Motor vehicles" and B "trucksA(LandVehicle { "{ automotive ═ automatic } }" } and DEFSETB{ LandVehicle { [ automatic ], { transport | transport { [ from }, and (physical | substance } } } "}, apparently no DEF is presentA∈DEFSETAOr DEFB∈DEFSETBContains the sense original "part | part", thus going to step 3-1).
And step 103, identifying the synonymy relation.
The method comprises the following steps of extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among words or not based on the synonym set, wherein the method specifically comprises the following steps:
step 3-1) representing synonyms according to the row marked with ═ in the expansion edition of the Harmony large synonym forest, and acquiring the synonym set SSET of the word AAIf B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-2) is switched;
extracting synonym set of the word A 'motor vehicle' from the expansion version of the great synonym forest of the Hagong
Figure BDA0001381833890000074
B 'truck'
Figure BDA0001381833890000075
So go to step 3-2).
Step 3-2) extracting synonym set SSET of the word A by utilizing HowNetAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-3) is carried out;
in HowNet, the SSET is extracted from the set of synonyms for the word A "motor vehicleA{ "motor vehicle", "automobile", "car", "sleeper" }, due to B "truck"
Figure BDA0001381833890000076
So go to step 3-3).
Step 3-3) extracting synonym set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-4) is switched;
in Baidu Chinese, the synonyms of the word A 'motor vehicle' are extracted to be collected
Figure BDA0001381833890000071
"truck" for the reason of B "
Figure BDA0001381833890000077
So go to step 3-4).
Step 3-4) acquiring encyclopedia link page sets PSET of the words A and B respectively according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure BDA0001381833890000072
Then words a and B have a synonymy relationship, otherwise go to step 4-1).
In Baidu encyclopedia, the encyclopedia link pages of words A 'motor vehicle' and B 'truck' are respectively extracted to be aggregated into PSETA{ "https:// baikeB{ "https:// baike.baidu.com/item/truck/4339", "https:// baike.baidu.com/item/truck/15281831", "https:// baike.baidu.com/item/truck/622401", "https:// baike.baidu.com/item/truck/3697802", "https:// baike.baidu.com/item/truck/7109303", "https:// baike.baidu.com/item/truck/3697784" }, since
Figure BDA0001381833890000073
So go to step 4-1).
And 104, identifying the upper and lower relation.
Extracting a hyponym set by means of various Chinese knowledge resources, and judging whether the words have a superior-subordinate relation according to the hyponym set, wherein the method specifically comprises the following steps:
step 4-1) extracting hyponym sets HSET of words A and B respectively by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have an upper-lower relation, otherwise, turning to the step 4-2);
extracting the lower words of words A 'motor vehicle' and B 'truck' respectively in HowNet to obtain HSETA{ "audi", "bus", "regular bus", "charter", "bmw", "galloping", "honk", "coach", "taxi", "coach", "bus", "universe", "taxi", "tram", "toyota", "ford", "bus", "black car", "truck", "container car", "airport bus", "emergency tender", "taxi", "traffic van", "coach", "police car", "ambulance", "old car", "truck", "cadilac", "empty car", "linken", "leak car", "truck", "kadi lac", "taxi", "parking in a car", "parking in a car", "parking", "in a car", "parking", etc. in a car "," parking ", and the like", etc. in a car "," parking, and/in a car "," parking in a car ", etc. in a carTrolley, station wagon, tourist coach, merseidess, minibus, shuttle coach, double bus, private car, commuter coach, shuttle bus, trolley bus, modern, fire truck, minibus, mini-bus, minibus, cruiser, patrol car, tourist coach, cross-country vehicle, transport vehicle, truck, dump truck and dump truck
Figure BDA0001381833890000083
The term B 'truck' belongs to HSETATherefore, the words A "motor vehicle" and B "truck" have a context relationship, that is, the semantic relationship recognition operation is completed up to this point.
Step 4-2) extracting definition sets DEFSET of words A and B respectively according to the upper and lower relations implied by HowNet definitionAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure BDA0001381833890000084
or
Figure BDA0001381833890000085
Then words a and B have an up-down relationship.
In the same way, the semantic relation recognition operation of the words on the human and the brain bags can be completed, and in order to explain the specific generalization operation, the following directly transits to the step 2-2),
in HowNet, DEFSET is extracted from the definition sets of words A "human" and B "brain bag" respectivelyA{ Behavior | hold ═ host ═ human } }, "{ Physique ═ host ═ animal } }," { Strength | power: host ═ community } }, "{ human }", "{ human | human ═ 3rdPerson }," { human | human: persona ═ 3rdPerson } }, "{ human | human: persona ═ 3rdPerson ═ he }", "{ human | human: persona ═ 3rdPerson ═ he }," { human ═ quality ═ mass } }, "{ human | human: modifier ═ mass ═ and" { human ═ modifier { }, "{ human { (physical }," "human { (human }," human { (human }, "human { (human },", and/oradult | adult } }, "{ human | quality ═ mass }," { human | person: { environment { }, "{ human | person: { environment: agent { - }, content ═ fact: modefier ═ specific } }, DEFSSETBThe term "part" means "part" head ", hold" animal ", obviously no DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSo that DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFATherefore, a generalization operation is performed to generalize DEFA"human" defines "AnimalHuman animal" for its higher-order concept meaning, when DEF is presentB∈DEFSETBContains the attribute "whole" and has a value of "{ AnimalHuman | animals } }", so that the words "person" and "brain sack" have an integral part relationship.
Through the above operation steps, the semantic relation recognition work of the given word pair can be completed.
Correspondingly, the embodiment of the invention also provides a Chinese word semantic relation recognition device based on multiple Chinese knowledge resources, and the structural schematic diagram of the device is shown in FIG. 2.
In this embodiment, the apparatus comprises:
an antisense relation recognition unit 201, configured to obtain an antisense word set using multiple chinese knowledge resources, and determine whether a semantic relation between words has an antisense relation according to the antisense word set;
an integral part relation recognition unit 202, configured to extract a partial word set using multiple chinese knowledge resources, and determine whether there is an integral part relation between words according to the partial word set;
the synonymy relation recognition unit 203 is used for extracting a synonymy set by utilizing various Chinese knowledge resources and judging whether synonymy relations exist among the words or not based on the synonymy set;
and the superior-inferior relation identification unit 204 is used for extracting a subordinate word set by means of various Chinese knowledge resources and judging whether superior-inferior relations exist among the words or not according to the subordinate word set.
The schematic structure diagram of the antisense relation recognition unit 201 of the device shown in fig. 2 is shown in fig. 3, and it includes:
a HowNet antisense relation recognition unit 301 for performing a set of antisense words ASET of word a for given words a and B using the explicitly defined antisense relation in HowNetAExtracting if B is equal to ASETAIf the two words have antisense relation, otherwise, the Chinese antisense relation recognition unit is turned to Baidu, and in addition, the sense relation defined in HowNet is also used as an antisense relation;
a Baidu Chinese antisense relation identifying unit 302 for extracting an antisense word set ASET of a given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to an encyclopedia antisense relation recognition unit;
an Baidu encyclopedia antisense relation recognition unit 303 for extracting an antisense word set ASET of the word A using Baidu encyclopediaAIf the word B belongs to ASETAIf the two words have antisense relation, otherwise, the two words are converted into the whole part relation identification unit.
Fig. 4 is a schematic structural diagram of the whole part relationship identification unit 202 of the apparatus shown in fig. 2, which includes:
a HowNet integral part relation recognition unit 401, configured to extract part of word sets MSET of the words a and B by using HowNetAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf the two words have the integral part relationship, otherwise, the original definition integral part relationship identification unit is defined;
an ambiguities definition global part relation recognition unit 402 for processing using HowNet ambiguities, a word in a definition containing an ambiguities "part" representing the word as a part word (part) of a word, the value of the "while" attribute in the definition indicating the ambiguities definition of its global word, from which the ambiguities definition sets DEFSET of words a and B are extractedAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf the words A and B have an integral partial relationship, otherwise, the words A and B are converted into a synonymy relationship identification unit;
in addition, in the definition-based whole part relationship recognition unit, some words may be processed in a generalization manner so that the whole part relationship cannot be effectively recognized by directly using the definition, and the value of the "whole" attribute in the above description is generalized to the upper concept thereof, and the rest of the operations are unchanged.
Fig. 5 shows a schematic structural diagram of the synonymy relationship identification unit 203 of the apparatus shown in fig. 2, which includes:
a synonym relation identifying unit 501 for obtaining a synonym set SSET of the word a according to the synonym represented by the row labeled "═ in the expansion version of the large synonym forest of hayageAIf B ∈ SSETAIf yes, the words A and B have a synonymy relationship, otherwise, the HowNet synonymy relationship recognition unit is switched;
a HowNet synonymy relation identification unit 502, configured to extract a synonymy set SSET of the word A using HowNetAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a hundred-degree Chinese synonymy relationship identification unit;
a Baidu Chinese synonymy relation identification unit 503 for extracting a synonym set SSET of the word a using Baidu ChineseAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a Baidu encyclopedia synonymy relationship identification unit;
an encyclopedia synonymy relationship identification unit 504, configured to obtain encyclopedia link page sets PSET for the words a and B according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure BDA0001381833890000103
The words A and B have a synonymy relationship, otherwise, the upper and lower relationship identification units are switched.
Fig. 6 shows a schematic structural diagram of the superior-inferior relation identification unit 204 of the apparatus shown in fig. 2, which includes:
a lower relation recognition unit 601 of HowNet, configured to extract lower word sets HSET of words A and B by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have upper and lower relations, otherwise, the words A and B are transferred to the original definition of the upper and lower relation identification unit;
a definition context identification unit 602, configured to extract definition sets DEFSET of words a and B according to the context implied by the HowNet definitionAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure BDA0001381833890000101
or
Figure BDA0001381833890000102
Then words a and B have an up-down relationship.
The Chinese word semantic relation recognition device based on various Chinese knowledge resources shown in fig. 2 to 6 can be integrated into various hardware entities. For example, a Chinese term semantic relationship recognition device based on multiple Chinese knowledge resources can be integrated into: personal computers, smart phones, workstations, and the like.
The Chinese word semantic relation recognition method based on multiple Chinese knowledge resources provided by the embodiment of the invention can be stored on various storage media in a storage mode of instruction or instruction set storage. Such storage media include, but are not limited to: floppy disk, optical disk, hard disk, memory, U disk, CF card, SM card, etc.
In summary, in the embodiment of the present invention, the antisense word set is obtained by combining multiple chinese knowledge resources, and whether the semantic relationship between words has an antisense relationship is determined according to the antisense word set; extracting a partial word set by using various Chinese knowledge resources, and judging whether integral partial relations exist among words or not according to the partial word set; extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among the words or not based on the synonym set; extracting a hyponym set by means of various Chinese knowledge resources, and judging whether the words have a superior-subordinate relationship or not according to the hyponym set. Therefore, after the embodiment of the invention is applied, the Chinese word semantic relation recognition based on various Chinese knowledge resources is realized. The implementation mode of the invention can utilize various different Chinese knowledge resources to identify the semantic relation of the words, and fully utilizes each knowledge resource; in the whole part identification process, aiming at the characteristics of the definition of the HowNet sememe, the method is supplemented by a generalization method, so that the adaptability of the identification method is improved; in the process of identifying the upper and lower relations, the invention fully excavates the information contained in the definition in HowNet, thereby effectively improving the accuracy of identification; the Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources can automatically recognize semantic relations of a given word pair, including antisense relations, integral part relations, synonymy relations and superior and inferior relations, and have high recognition accuracy.
The embodiments in this specification are described in a progressive manner, and like parts may be referred to each other. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points.
The foregoing detailed description of the embodiments of the present invention has been presented for purposes of illustration and description and is intended to be exemplary only of the method and apparatus for practicing the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and therefore the present specification should not be construed as limiting the present invention.

Claims (2)

1. A Chinese word semantic relation recognition method based on multiple Chinese knowledge resources is characterized by comprising the following steps:
acquiring an antisense word set by combining various Chinese knowledge resources, and judging whether the semantic relation between words has an antisense relation or not according to the antisense word set;
step 1-1) performing an antisense word set ASET of a word A for given words A and B by using an explicitly defined antisense relation in HowNetAExtracting if B is equal to ASETAIf so, the two words have antisense relation, otherwise, the step 1-2) is switched to, and in addition, the sense relation defined in HowNet is also treated as an antisense relation;
step 1-2) extraction of antisense word set ASET of given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to the step 1-3);
step 1-3) extracting the antisense word set ASET of the word A by using Baidu encyclopediaAIf the word B belongs to ASETAIf so, the two words have antisense relation, otherwise, the step 2-1) is carried out;
extracting a partial word set by using various Chinese knowledge resources, and judging whether integral partial relations exist among the words or not according to the partial word set;
step 2-1) extracting partial word sets MSET of words A and B respectively by using HowNetAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf so, the two words have integral part relation, otherwise, the step 2-2) is carried out;
step 2-2) processing using HowNet semantic definition, where a word containing the semantic "part" in the definition indicates that the word is a part word or part of a word, and the value of the "while" attribute in the definition indicates the semantic definition of the whole word, from which the semantic definition set DEFSET of words A and B is extractedAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAThen words A and B existIntegral part relation, otherwise, turning to the step 3-1);
in addition, some words cannot effectively identify the whole part relationship by directly utilizing definition, can be processed in a generalization mode, generalizes the value of the 'window' attribute into the upper concept thereof, and keeps the rest operations unchanged;
thirdly, extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among the words or not based on the synonym set;
step 3-1) representing synonyms according to the row marked with ═ in the expansion edition of the Harmony large synonym forest, and acquiring the synonym set SSET of the word AAIf B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-2) is switched;
step 3-2) extracting synonym set SSET of the word A by utilizing HowNetAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-3) is carried out;
step 3-3) extracting synonym set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-4) is switched;
step 3-4) acquiring encyclopedia link page sets PSET of the words A and B respectively according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure FDA0003484072580000021
If yes, synonymy relation exists between words A and B, otherwise, turning to step 4-1);
extracting a hyponym set by means of multiple Chinese knowledge resources, and judging whether the words have superior-subordinate relations or not according to the hyponym set;
step 4-1) extracting hyponym sets HSET of words A and B respectively by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have an upper-lower relation, otherwise, turning to the step 4-2);
step 4-2) extracting the definition definitions of the words A and B respectively according to the upper and lower relations contained in the definition of the HowNetDEFSET setAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure FDA0003484072580000022
Figure FDA0003484072580000023
or
Figure FDA0003484072580000024
Then words a and B have an up-down relationship.
2. A Chinese word semantic relation recognition device based on multiple Chinese knowledge resources is characterized by comprising an antisense relation recognition unit, an integral part relation recognition unit, a synonym relation recognition unit and an upper and lower relation recognition unit, wherein:
the antisense relation identification unit is used for acquiring an antisense word set by using various Chinese knowledge resources and judging whether the semantic relation among the words has an antisense relation or not according to the antisense word set;
the integral part relation recognition unit is used for extracting a part word set by using various Chinese knowledge resources and judging whether integral part relations exist among the words or not according to the part word set;
the synonymy relation identification unit is used for extracting a synonymy set by utilizing various Chinese knowledge resources and judging whether synonymy relations exist among the words or not based on the synonymy set;
the upper and lower relation identification unit is used for extracting a lower word set by means of various Chinese knowledge resources and judging whether the upper and lower relations exist among the words or not according to the lower word set;
the antisense relation recognition unit further comprises:
a HowNet antisense relation recognition unit for performing an ASET set of words A on given words A and B by using the antisense relation explicitly defined in HowNetAExtracting if B is equal to ASETAThen, thenThe two words have antisense relation, otherwise, the Chinese antisense relation recognition unit is turned to Baidu, and the sense relation defined in HowNet is also used as an antisense relation;
baidu Chinese antisense relation recognition unit for extracting set of antisense words ASET of given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to an encyclopedia antisense relation recognition unit;
an Baidu encyclopedia antisense relation recognition unit for extracting the antisense word set ASET of the word A by using Baidu encyclopediaAIf the word B belongs to ASETAIf the two words have antisense relation, otherwise, the two words are converted into an integral part relation identification unit;
the whole part relation identifying unit further includes:
a HowNet integral part relation recognition unit for respectively extracting part word sets MSET of the words A and B by using HowNetA-And MSETBIf B ∈ MSETAOr A ∈ MSETBIf the two words have the integral part relationship, otherwise, the original definition integral part relationship identification unit is defined;
definition integral part relation recognition unit for processing using HowNet definition, wherein the word containing definition 'part' indicates the word as part word of a certain word, namely part, and the value of 'w hole' attribute in the definition indicates the definition of the integral word, and the definition DEFSET of the definition set of words A and B is extracted according to the definitionAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf the words A and B have an integral partial relationship, otherwise, the words A and B are converted into a synonymy relationship identification unit;
in addition, in the definition whole part relation recognition unit, some words can not effectively recognize the whole part relation by directly utilizing definition, and can be processed in a generalization mode, the value of the middle 'whole' attribute is generalized to the upper concept, and other operations are unchanged;
the synonymy relationship identification unit further includes:
a synonym relation identification unit of the word forest, which is used for obtaining the synonym set SSET of the word A according to the line marked with ═ in the expansion version of the large synonym forest of the HaughAIf B ∈ SSETAIf yes, the words A and B have a synonymy relationship, otherwise, the HowNet synonymy relationship recognition unit is switched;
a HowNet synonymy relation recognition unit for extracting the synonymy set SSET of the word A by utilizing HowNetAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a hundred-degree Chinese synonymy relationship identification unit;
a Baidu Chinese synonymy relation recognition unit for extracting the synonymy set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a Baidu encyclopedia synonymy relationship identification unit;
an encyclopedia synonymy relationship identification unit used for respectively acquiring encyclopedia link page sets PSET of the words A and B according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure FDA0003484072580000031
The words A and B have a synonymy relation, otherwise, the words A and B are converted to the superior and inferior relation identification unit;
the superior-inferior relationship identifying unit further includes:
a lower-level relation recognition unit of HowNet for respectively extracting lower-level word sets HSET of words A and B by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have upper and lower relations, otherwise, the words A and B are transferred to the original definition of the upper and lower relation identification unit;
a definition upper and lower relation identification unit for defining upper and lower relation according to HowNet definitionBit relation, extracting Definite definition sets DEFSET of words A and B respectivelyAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure FDA0003484072580000041
or
Figure FDA0003484072580000042
Then words a and B have an up-down relationship.
CN201710707420.7A 2017-08-17 2017-08-17 Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources Active CN107451123B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710707420.7A CN107451123B (en) 2017-08-17 2017-08-17 Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710707420.7A CN107451123B (en) 2017-08-17 2017-08-17 Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources

Publications (2)

Publication Number Publication Date
CN107451123A CN107451123A (en) 2017-12-08
CN107451123B true CN107451123B (en) 2022-04-15

Family

ID=60491463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710707420.7A Active CN107451123B (en) 2017-08-17 2017-08-17 Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources

Country Status (1)

Country Link
CN (1) CN107451123B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086328B (en) * 2018-06-29 2021-03-30 北京百度网讯科技有限公司 Method and device for determining upper and lower position relation, server and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3996125B2 (en) * 2001-09-14 2007-10-24 ソニー株式会社 Sentence generation apparatus and generation method
CN104484411B (en) * 2014-12-16 2017-12-22 中国科学院自动化研究所 A kind of construction method of the semantic knowledge-base based on dictionary

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《知网》在命名实体识别中的应用研究;郑逢强 等;《中文信息学报》;20081119;第2-3页 *
基于词汇和Web的词汇关系抽取;无;《http://www.doc88.com/p-1146077617476.html》;20150110;第1-5页 *
无.基于词汇和Web的词汇关系抽取.《http://www.doc88.com/p-1146077617476.html》.2015, *

Also Published As

Publication number Publication date
CN107451123A (en) 2017-12-08

Similar Documents

Publication Publication Date Title
Sui et al. A clustering approach to developing car-to-two-wheeler test scenarios for the assessment of Automated Emergency Braking in China using in-depth Chinese crash data
US9867015B1 (en) Systems and methods for detecting mobile device movement within a vehicle using accelerometer data
CN106845453B (en) Taillight detection and recognition methods based on image
CN104853972A (en) Augmenting ADAS features of vehicle with image processing support in on-board vehicle platform
Pan et al. Study of typical electric two‐wheelers pre-crash scenarios using K-medoids clustering methodology based on video recordings in China
CN204472677U (en) A kind of automobile active safety drive assist system
US20210056718A1 (en) Domain adaptation for analysis of images
CN107451123B (en) Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources
CN110304068A (en) Acquisition method, device, equipment and the storage medium of running car environmental information
US11954916B2 (en) Systems and methods for classifying detected objects in an image at an automated driving system
CN107451130B (en) Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources
CN104993997A (en) Internet of vehicles system and method for transmitting information between vehicle owners
CN104376713A (en) Method for achieving network map function of vehicle freighting system
US10963712B1 (en) Systems and methods for distinguishing a driver and passengers in an image captured inside a vehicle
CN115279643A (en) On-board active learning method and apparatus for training a perception network of an autonomous vehicle
Xu et al. Roadside estimation of a vehicle’s center of gravity height based on an improved single-stage detection algorithm and regression prediction technology
US20190073737A1 (en) Facilitating Cross-Platform Transportation Arrangements with Third Party Providers
US11341755B2 (en) Information acquisition apparatus
CN111754766B (en) Information acquisition device
CN107310397A (en) A kind of computational methods of brake energy recovering system energy recovery rate
CN206579552U (en) A kind of speed display device for automobile tail
Скуйбедина Фоновые знания языка в национально-маркированных формах передвижения (на примере видов транспорта в Британии)
Kim A Study on the Analysis of R&D Trends and the Development of Logic Models for Autonomous Vehicles
US20230408283A1 (en) System for extended reality augmentation of situational navigation
CN204928808U (en) WiFi intelligent terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20221222

Address after: Room 02A-084, Building C (Second Floor), No. 28, Xinxi Road, Haidian District, Beijing 100085

Patentee after: Jingchuang United (Beijing) Intellectual Property Service Co.,Ltd.

Address before: No. 3501, Daxue Road, science and Technology Park, Xincheng University, Jinan, Shandong Province

Patentee before: Qilu University of Technology

Effective date of registration: 20221222

Address after: Room 606-609, Compound Office Complex Building, No. 757, Dongfeng East Road, Yuexiu District, Guangzhou, Guangdong Province, 510699

Patentee after: China Southern Power Grid Internet Service Co.,Ltd.

Address before: Room 02A-084, Building C (Second Floor), No. 28, Xinxi Road, Haidian District, Beijing 100085

Patentee before: Jingchuang United (Beijing) Intellectual Property Service Co.,Ltd.

TR01 Transfer of patent right