CN107451130B - Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources - Google Patents

Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources Download PDF

Info

Publication number
CN107451130B
CN107451130B CN201710706832.9A CN201710706832A CN107451130B CN 107451130 B CN107451130 B CN 107451130B CN 201710706832 A CN201710706832 A CN 201710706832A CN 107451130 B CN107451130 B CN 107451130B
Authority
CN
China
Prior art keywords
word
words
relation
chinese
english
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710706832.9A
Other languages
Chinese (zh)
Other versions
CN107451130A (en
Inventor
鹿文鹏
孟凡擎
张玉腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201710706832.9A priority Critical patent/CN107451130B/en
Publication of CN107451130A publication Critical patent/CN107451130A/en
Application granted granted Critical
Publication of CN107451130B publication Critical patent/CN107451130B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources. The method comprises the following steps: acquiring an antisense word set by combining various Chinese knowledge resources, and judging whether the semantic relation between the words has an antisense relation or not according to the antisense word set; extracting a partial word set by using various Chinese knowledge resources, and judging whether integral partial relations exist among words or not according to the partial word set; extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among the words or not based on the synonym set; extracting a hyponym set by means of multiple Chinese knowledge resources, and judging whether the words have a superior-subordinate relationship according to the hyponym set; translating the Chinese word pair into English by using a Chinese-English dictionary; and performing word semantic relation recognition on English word pairs obtained by translating Chinese and English by using English knowledge resources to determine the semantic relation of the original Chinese word pairs. The invention can fully play the role of various Chinese knowledge resources and more accurately and effectively identify the semantic relation of Chinese words.

Description

Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources
Technical Field
The invention relates to the technical field of natural language processing, in particular to a Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources.
Background
Semantic relationship recognition refers to the automatic determination of semantic relationships that a given word pair has between words. Typical semantic relationships include: antisense relationship, whole part relationship, synonymous relationship, superior-inferior relationship, etc. Semantic relation recognition is a fundamental task in the field of natural language processing, and has direct influence on word sense disambiguation, knowledge ontology construction, machine translation, information retrieval, text classification and the like.
Most of current semantic relation recognition research works are mainly aiming at English, and generally based on one or more knowledge resources, classification or recognition tasks of English semantic relations are completed by using statistical learning methods such as a support vector machine and a Bayesian classifier, so that a good effect is achieved. The research work in the aspect of Chinese semantic relation recognition is relatively less, and most of related work usually adopts a certain knowledge resource to recognize the semantic relation by means of a statistical learning method. The existing research work only adopts a certain knowledge resource, and ignores the mining and utilization of other language knowledge resources; the statistical learning method is difficult to avoid the restriction of the scale of the labeled corpus, and the accuracy rate is difficult to ensure. Along with the construction and the improvement of various language knowledge resources, the resources are mutually supplemented, and more reliable knowledge is provided for the identification of semantic relations.
In the face of the technical problems existing in the Chinese word semantic relation recognition, the invention fully excavates the internal semantic relation of a plurality of knowledge resources, realizes a Chinese word semantic relation recognition method and a Chinese word semantic relation recognition device based on a plurality of Chinese knowledge resources, and strives to promote the solution of the problems to a certain extent.
Disclosure of Invention
In order to solve the defects of the prior art, the invention discloses a Chinese word semantic relation recognition method and a device which are combined with Chinese and English knowledge resources, so as to more accurately and effectively judge the semantic relation between Chinese words.
Therefore, the invention provides the following technical scheme:
a Chinese word semantic relation recognition method combining Chinese and English knowledge resources comprises the following steps:
acquiring an antisense word set by combining various Chinese knowledge resources, and judging whether the semantic relation between words has an antisense relation or not according to the antisense word set;
extracting a partial word set by using various Chinese knowledge resources, and judging whether integral partial relations exist among the words or not according to the partial word set;
thirdly, extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among the words or not based on the synonym set;
extracting a hyponym set by means of multiple Chinese knowledge resources, and judging whether the words have superior-subordinate relations or not according to the hyponym set;
step five, translating the Chinese word pair into English by using a Chinese-English dictionary;
and sixthly, performing word semantic relation recognition on the English word pair obtained in the step five by using English knowledge resources to determine the semantic relation of the original Chinese word pair.
Further, in the step one, when determining the antisense semantic relationship, specifically:
step 1-1) performing an antisense word set ASET of a word A for given words A and B by using an explicitly defined antisense relation in HowNetAExtracting if B is equal to ASETAIf so, the two words have antisense relation, otherwise, the step 1-2) is switched to, and in addition, the sense relation defined in HowNet is also treated as an antisense relation;
step 1-2) extraction of antisense word set ASET of given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to the step 1-3);
step 1-3) extracting the antisense word set ASET of the word A by using Baidu encyclopediaAIf the word B belongs to ASETAIf the two words have antisense relation, otherwise, go to step 2-1).
Further, in the second step, when determining the integral part relationship, specifically:
step 2-1) extracting partial word sets MSET of words A and B respectively by using HowNetAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf so, the two words have integral part relation, otherwise, the step 2-2) is carried out;
step 2-2) processing using HowNet definition, a word containing the definition "part component" representing the word as a partial word of a certain word ((ii) a partial word of the wordComponent), the value of the "whole" attribute in the definition indicates the definition of its whole words, from which the set of definitional definitions DEFSET of words a and B is extractedAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf so, the words A and B have an integral part relationship, otherwise, turning to the step 3-1);
in addition, some words can be processed in a generalization mode for the whole part relation which can not be effectively recognized by directly utilizing definition of definition, and the value of the attribute of 'window' in the above is generalized to be the upper concept, and the rest operations are unchanged.
Further, in the third step, when determining the synonymous relationship, specifically:
step 3-1) representing synonyms according to the row marked with ═ in the expansion edition of the Harmony large synonym forest, and acquiring the synonym set SSET of the word AAIf B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-2) is switched;
step 3-2) extracting synonym set SSET of the word A by utilizing HowNetAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-3) is carried out;
step 3-3) extracting synonym set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-4) is switched;
step 3-4) acquiring encyclopedia link page sets PSET of the words A and B respectively according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure BDA0001381695150000031
Then words a and B have a synonymy relationship, otherwise go to step 4-1).
Further, in the fourth step, when determining the upper-lower relationship, the following steps are specifically performed:
step 4-1) utilizing HowNet scoreExtracting hyponym set HSET of words A and BAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have an upper-lower relation, otherwise, turning to the step 4-2);
step 4-2) extracting definition sets DEFSET of words A and B respectively according to the upper and lower relations implied by HowNet definitionAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure BDA0001381695150000032
or
Figure BDA0001381695150000033
The words a and B have an up-down relationship, otherwise go to step 5-1).
Further, in the fifth step, when translating the word pair, the method specifically includes:
step 5-1) translating the words A and B into corresponding English sets ENSET (A) and ENSET (B) respectively by using a Chinese-English dictionary.
Further, in the sixth step, when performing semantic recognition using the english knowledge resource, specifically:
step 6-1) for each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and extracts the word EN according to English knowledge resourcesASet of antisense words ENASETAIf the word ENB∈ENASETAThen English word ENAAnd ENBAn antisense relation exists, namely the original Chinese word pair has the antisense relation, otherwise, the step 6-2 is carried out;
step 6-2) for each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and respectively extracts words EN according to English knowledge resourcesAAnd ENBPart of word set ENMSETAAnd ENMSETBIf the word ENB∈ENMSETAOr ENA∈ENMSETBThen English word ENAAnd ENBThere is a whole part relation, i.e. the original Chinese word pairIntegral part relation exists, otherwise, turning to the step 6-3);
step 6-3) for each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and extracts the word EN according to English knowledge resourcesASynonym set ENSSETAIf the word ENB∈ENSSETAThen English word ENAAnd ENBThe synonymy relation exists, namely the synonymy relation exists in the original Chinese word pair, otherwise, the step 6-4 is carried out);
step 6-4) for each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and respectively extracts words EN according to English knowledge resourcesAAnd ENBLower-level word set ENHSETAAnd ENHSETBIf the word ENB∈ENHSETAOr ENA∈ENHSETBThen English word ENAAnd ENBThere is a superior-inferior relation, i.e. there is a superior-inferior relation for the original Chinese word pair.
A Chinese word semantic relation recognition device combining Chinese and English knowledge resources comprises the following steps:
the antisense relation identification unit is used for acquiring an antisense word set by using various Chinese knowledge resources and judging whether the semantic relation among the words has an antisense relation or not according to the antisense word set;
the integral part relation recognition unit is used for extracting a part word set by using various Chinese knowledge resources and judging whether integral part relations exist among the words or not according to the part word set;
the synonymy relation identification unit is used for extracting a synonymy set by utilizing various Chinese knowledge resources and judging whether synonymy relations exist among the words or not based on the synonymy set;
the upper and lower relation identification unit is used for extracting a lower word set by means of various Chinese knowledge resources and judging whether the upper and lower relations exist among the words or not according to the lower word set;
the Chinese-English translation unit is used for translating the Chinese word pair into English by using a Chinese-English dictionary;
and the English word semantic relation recognition unit is used for recognizing the word semantic relation of the English word pair obtained by the Chinese-English translation unit by using English knowledge resources so as to determine the semantic relation of the original Chinese word pair.
Further, the antisense relation identification unit further comprises:
a HowNet antisense relation recognition unit for performing an ASET set of words A on given words A and B by using the antisense relation explicitly defined in HowNetAExtracting if B is equal to ASETAIf the two words have antisense relation, otherwise, the Chinese antisense relation recognition unit is turned to Baidu, and in addition, the sense relation defined in HowNet is also used as an antisense relation;
baidu Chinese antisense relation recognition unit for extracting set of antisense words ASET of given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to an encyclopedia antisense relation recognition unit;
an Baidu encyclopedia antisense relation recognition unit for extracting the antisense word set ASET of the word A by using Baidu encyclopediaAIf the word B belongs to ASETAIf the two words have antisense relation, otherwise, the two words are converted into the whole part relation identification unit.
Further, the whole part relation identifying unit further includes:
a HowNet integral part relation recognition unit for respectively extracting part word sets MSET of the words A and B by using HowNetAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf the two words have the integral part relationship, otherwise, the original definition integral part relationship identification unit is defined;
and a definition whole part relation recognition unit for processing by using HowNet definition, wherein the definition contains a word of definition part word (part) as a word, the definition contains definition part word, the definition of whole word is indicated by the value of 'w hole' attribute, and the definition of whole word is extracted according to the definitionSense set DEFSETAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf the words A and B have an integral partial relationship, otherwise, the words A and B are converted into a synonymy relationship identification unit;
in addition, in the definition-based whole part relationship recognition unit, some words may be processed in a generalization manner so that the whole part relationship cannot be effectively recognized by directly using the definition, and the value of the "whole" attribute in the above description is generalized to the upper concept thereof, and the rest of the operations are unchanged.
Further, the synonymy relationship identification unit further includes:
a synonym relation identification unit of the word forest, which is used for obtaining the synonym set SSET of the word A according to the line marked with ═ in the expansion version of the large synonym forest of the HaughAIf B ∈ SSETAIf yes, the words A and B have a synonymy relationship, otherwise, the HowNet synonymy relationship recognition unit is switched;
a HowNet synonymy relation recognition unit for extracting the synonymy set SSET of the word A by utilizing HowNetAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a hundred-degree Chinese synonymy relationship identification unit;
a Baidu Chinese synonymy relation recognition unit for extracting the synonymy set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a Baidu encyclopedia synonymy relationship identification unit;
an encyclopedia synonymy relationship identification unit used for respectively acquiring encyclopedia link page sets PSET of the words A and B according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure BDA0001381695150000051
The words A and B have a synonymy relationship, otherwise, the upper and lower relationship identification units are switched.
Further, the context identification unit further includes:
a lower-level relation recognition unit of HowNet for respectively extracting lower-level word sets HSET of words A and B by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have upper and lower relations, otherwise, the words A and B are transferred to the original definition of the upper and lower relation identification unit;
a definition upper and lower relation identification unit for respectively extracting definition sets DEFSET of the words A and B according to the upper and lower relations implied by the HowNet definitionAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure BDA0001381695150000052
or
Figure BDA0001381695150000053
The words A and B have a top-bottom relationship, otherwise, the Chinese-English translation unit is turned.
Further, the chinese-english translation unit further includes:
and the Chinese-English translation unit is used for translating the words A and B into corresponding English sets ENSET (A) and ENSET (B) respectively by utilizing a Chinese-English dictionary.
Further, the english word semantic relation identifying unit further includes:
an English antisense relation identification unit for identifying each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and extracts the word EN according to English knowledge resourcesASet of antisense words ENASETAIf the word ENB∈ENASETAThen English word ENAAnd ENBThe anti-sense relation exists, namely the anti-sense relation exists in the original Chinese word pair, otherwise, the English whole part relation identification unit is converted;
an English integral part relation identification unit for identifying each English word ENA∈ENSET(A),ENBE.g. ENSET (B), based on English knowledge resourcesRespectively extracting the words ENAAnd ENBPart of word set ENMSETAAnd ENMSETBIf the word ENB∈ENMSETAOr ENA∈ENMSETBThen English word ENAAnd ENBThe integral part relation exists, namely the original Chinese word pair has the integral part relation, otherwise, the English synonymy relation is converted into an English synonymy relation identification unit;
an English synonymy relation identification unit for identifying each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and extracts the word EN according to English knowledge resourcesASynonym set ENSSETAIf the word ENB∈ENSSETAThen English word ENAAnd ENBThe synonymy relation exists, namely the synonymy relation exists in the original Chinese word pair, otherwise, the English upper and lower relation identification unit is converted;
an English upper and lower position relation identification unit for identifying each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and respectively extracts words EN according to English knowledge resourcesAAnd ENBLower-level word set ENHSETAAnd ENHSETBIf the word ENB∈ENHSETAOr ENA∈ENHSETBThen English word ENAAnd ENBThere is a superior-inferior relation, i.e. there is a superior-inferior relation for the original Chinese word pair.
The invention has the beneficial effects that:
1. the invention utilizes various Chinese knowledge resources to identify the semantic relation of the words, and fully utilizes each knowledge resource.
2. In the whole part relation recognition operation, aiming at the characteristics of the definition of the sememe of HowNet, the method is supplemented by a generalization method, so that the adaptability of the recognition method is improved.
3. In the process of identifying the upper and lower relation, the invention fully excavates the information contained in the definition in HowNet, and effectively improves the accuracy of identification.
4. The invention combines Chinese and English knowledge resources, utilizes the English knowledge resources to supplement word semantic relations which are not covered by the Chinese knowledge resources, and improves the recognition rate.
5. The Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources, provided by the invention, can automatically recognize the semantic relation of a given word pair, including antisense relation, integral part relation, synonymous relation and superior-inferior relation, and has higher recognition accuracy.
Drawings
FIG. 1 is a flow chart of a method for recognizing semantic relations of Chinese words in combination with Chinese-English knowledge resources according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a Chinese term semantic relationship recognition apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the structure of an antisense relation recognition unit according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an overall part relationship identification unit according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a synonym relationship identification unit according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a context identification unit according to an embodiment of the present invention;
FIG. 7 is a block diagram of a Chinese-to-English translation unit according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of an English word semantic relation recognition unit according to an embodiment of the present invention;
the specific implementation mode is as follows:
in order to make the technical field better understand the scheme of the embodiment of the invention, the following detailed description is provided for the embodiment of the invention with reference to the accompanying drawings and implementation modes.
The semantic recognition process is exemplified for a word pair consisting of the word a "motor vehicle" and the word B "truck".
The embodiment of the invention combines a flow chart of a Chinese word semantic relation recognition method of Chinese-English knowledge resources, as shown in figure 1, and comprises the following steps:
step 101, antisense relation identification.
Acquiring an antisense word set by combining various Chinese knowledge resources, and judging whether the semantic relation among the words has an antisense relation according to the antisense word set, wherein the method specifically comprises the following steps:
step 1-1) performing an antisense word set ASET of a word A for given words A and B by using an explicitly defined antisense relation in HowNetAExtracting if B is equal to ASETAIf so, the two words have antisense relation, otherwise, the step 1-2) is switched to, and in addition, the sense relation defined in HowNet is also treated as an antisense relation;
extracting the antisense words (including the para-meaning words) of the word A 'motor vehicle' from HowNet to obtain ASETAThe term "trailer", "cart", "dongfu car", "wheelbarrow", "rickshaw", "yellow croaker", "skeleton car", "rubber car", "bicycle", "horse car", "donkey car", "cow car", "volleyball", "flatbed car", "flatbed tricycle", "rickshaw", "tricycle", "mountain bike", "handcart", "animal car", "cart", "trolley", "ocean car", "moped", "bicycle", "a light horse cart", "chariot", "hub", "halter strap" }, obviously the term "truck" B "
Figure BDA0001381695150000075
So step 1-2) is performed.
Step 1-2) extraction of antisense word set ASET of given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to the step 1-3);
extracting the antisense words of the words A 'motor vehicles' from Baidu Chinese
Figure BDA0001381695150000071
Due to the word B 'truck'
Figure BDA0001381695150000076
Figure BDA0001381695150000077
So step 1-3) is performed.
Step 1-3) extracting the antisense word set ASET of the word A by using Baidu encyclopediaAIf the word B belongs to ASETAIf the two words have antisense relation, otherwise, go to step 2-1).
The antisense words of the word A 'motor vehicle' are extracted from Baidu encyclopedia
Figure BDA0001381695150000072
Due to the word B 'truck'
Figure BDA0001381695150000073
Figure BDA0001381695150000074
So go to step 2-1).
And 102, identifying the relationship of the whole part.
Extracting partial word sets by using various Chinese knowledge resources, and judging whether integral partial relations exist among words or not according to the partial word sets, wherein the method specifically comprises the following steps:
step 2-1) extracting partial word sets MSET of words A and B respectively by using HowNetAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf so, the two words have integral part relation, otherwise, the step 2-2) is carried out;
extracting partial word sets of words A 'motor vehicle' and B 'truck' from HowNet to obtain MSETA{ "headlight", "steering wheel", "sun visor", "trunk", "rear window", "tailgate", "rear lamp", "rear mirror", "cab", "sidecar", "straddle bucket", "automobile engine", "automobile horn", "automobile accessory", "cylinder", "headlight", "fuel gauge", "tail lamp", "trunk", "throttle", "sun visor top" },
Figure BDA0001381695150000081
"truck" for the reason of B "
Figure BDA0001381695150000082
Motor vehicle "
Figure BDA0001381695150000083
So go to step 2-2).
Step 2-2) processing using HowNet semantic definition, in which a word containing the semantic "part" indicates that the word is a partial word (part) of a word, and the value of the "while" attribute in the definition indicates the semantic definition of its whole word, from which the semantic definition set DEFSET of words A and B is extractedAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf so, the words A and B have an integral part relationship, otherwise, turning to the step 3-1);
in addition, some words can be processed in a generalization mode for the whole part relation which can not be effectively recognized by directly utilizing definition of definition, and the value of the attribute of 'window' in the above is generalized to be the upper concept, and the rest operations are unchanged.
Using HowNet to extract DEFSET sets of definitions for words A "Motor vehicles" and B "trucksA(LandVehicle { "{ automotive ═ automatic } }" } and DEFSETB{ LandVehicle { [ automatic ], { transport | transport { [ from }, and (physical | substance } } } "}, apparently no DEF is presentA∈DEFSETAOr DEFB∈DEFSETBContains the sense original "part | part", thus going to step 3-1).
And step 103, identifying the synonymy relation.
The method comprises the following steps of extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among words or not based on the synonym set, wherein the method specifically comprises the following steps:
step 3-1) representing synonyms according to the row marked with ═ in the expansion edition of the Harmony large synonym forest, and acquiring the synonym set SSET of the word AAIf B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-2) is switched;
extracting synonym set of the word A 'motor vehicle' from the expansion version of the great synonym forest of the Hagong
Figure BDA0001381695150000084
B 'truck'
Figure BDA0001381695150000085
So go to step 3-2).
Step 3-2) extracting synonym set SSET of the word A by utilizing HowNetAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-3) is carried out;
in HowNet, the SSET is extracted from the set of synonyms for the word A "motor vehicleA{ "motor vehicle", "automobile", "car", "sleeper" }, due to B "truck"
Figure BDA0001381695150000086
So go to step 3-3).
Step 3-3) extracting synonym set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, the step 3-4) is switched;
in Baidu Chinese, the synonyms of the word A 'motor vehicle' are extracted to be collected
Figure BDA0001381695150000091
"truck" for the reason of B "
Figure BDA0001381695150000092
So go to step 3-4).
Step 3-4) rootAccording to the page links of the encyclopedia, acquiring the encyclopedia link page sets PSET of the words A and B respectivelyAAnd PSETBIf it is satisfied
Figure BDA0001381695150000093
Then words a and B have a synonymy relationship, otherwise go to step 4-1).
In Baidu encyclopedia, the encyclopedia link pages of words A 'motor vehicle' and B 'truck' are respectively extracted to be aggregated into PSETA{ "https:// baikeB{ "https:// baike.baidu.com/item/truck/4339", "https:// baike.baidu.com/item/truck/15281831", "https:// baike.baidu.com/item/truck/622401", "https:// baike.baidu.com/item/truck/3697802", "https:// baike.baidu.com/item/truck/7109303", "https:// baike.baidu.com/item/truck/3697784" }, since
Figure BDA0001381695150000094
So go to step 4-1).
And 104, identifying the upper and lower relation.
Extracting a hyponym set by means of various Chinese knowledge resources, and judging whether the words have a superior-subordinate relation according to the hyponym set, wherein the method specifically comprises the following steps:
step 4-1) extracting hyponym sets HSET of words A and B respectively by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have an upper-lower relation, otherwise, turning to the step 4-2);
extracting the lower words of words A 'motor vehicle' and B 'truck' respectively in HowNet to obtain HSETA{ "audi", "bus", "regular bus", "charter", "bmw", "galloping", "honk", "coach", "taxi", "coach", "universe", "taxi", "tram", "bus", "black bus", "truck", "container van", "air conditioner",the term "commercial vehicle" refers to a commercial vehicle, such as a bus, an ambulance, a taxi, a police car, an ambulance, a fire truck, an old car, a truck, a cadilac, an air vehicle, a forest, a hopper car, a station wagon, a tourist car, a merseidess, a minibus, a shuttle car, a double-deck bus, a private car, a commuter car, a shuttle bus, a trolley bus, a modern train, a fire truck, a minibus, a small bus, a passenger car, a cruiser, a tourist car, a cross-country vehicle, a transport vehicle, a truck, an automatic dump truck, a dump truck and a train
Figure BDA0001381695150000095
The term B 'truck' belongs to HSETATherefore, the words A "motor vehicle" and B "truck" have a context relationship, that is, the semantic relationship recognition operation is completed up to this point.
Step 4-2) extracting definition sets DEFSET of words A and B respectively according to the upper and lower relations implied by HowNet definitionAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure BDA0001381695150000096
or
Figure BDA0001381695150000097
Then words a and B have an up-down relationship.
Step 105, Chinese-English translation.
The Chinese word pair translation is converted into English by using a Chinese-English dictionary, and the method specifically comprises the following steps:
step 5-1) translating the words A and B into corresponding English sets ENSET (A) and ENSET (B) respectively by using a Chinese-English dictionary.
And 106, identifying the semantic relation of the English words.
And (4) performing word semantic relation recognition on the English word pair obtained in the step five by using English knowledge resources to determine the semantic relation of the original Chinese word pair, wherein the method specifically comprises the following steps:
step 6-1) for each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and extracts the word EN according to English knowledge resourcesASet of antisense words ENASETAIf the word ENB∈ENASETAThen English word ENAAnd ENBAn antisense relation exists, namely the original Chinese word pair has the antisense relation, otherwise, the step 6-2 is carried out;
step 6-2) for each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and respectively extracts words EN according to English knowledge resourcesAAnd ENBPart of word set ENMSETAAnd ENMSETBIf the word ENB∈ENMSETAOr ENA∈ENMSETBThen English word ENAAnd ENBIntegral part relation exists, namely the original Chinese word pair has integral part relation, otherwise, the step 6-3) is carried out;
step 6-3) for each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and extracts the word EN according to English knowledge resourcesASynonym set ENSSETAIf the word ENB∈ENSSETAThen English word ENAAnd ENBThe synonymy relation exists, namely the synonymy relation exists in the original Chinese word pair, otherwise, the step 6-4 is carried out);
step 6-4) for each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and respectively extracts words EN according to English knowledge resourcesAAnd ENBLower-level word set ENHSETAAnd ENHSETBIf the word ENB∈ENHSETAOr ENA∈ENHSETBThen English word ENAAnd ENBThere is a superior-inferior relation, i.e. there is a superior-inferior relation for the original Chinese word pair.
Similarly, the semantic relation recognition operation of the words on the human and the brain bags can be completed, and in order to illustrate the specific generalization operation, the following directly transits to the step 2-2):
in HowNet, DEFSET is extracted from the definition sets of words A "human" and B "brain bag" respectivelyAThe term "human" may include, but { "{ Behavior { }," { physicque |: host ═ animal } }, "{ Strength | force: host ═ community } }," { human | person }, "{ human | person: persona ═ 3rdPerson }", "{ human | person: persona ═ 3rdPerson } }," { human | person: person ═ 3rdPerson }, "{ human ═ person ═ 3rdPerson }, quality }," { human | person }, "{ human |: modifier { (adolt | }," "{ human:, { human { (quality { }," { human { (special }, { human }, { factor { }, { quality }, { human { }, { human:, { quality { }, { human:, { factor }, { factor:, { quality }, { factorBThe term "part" means "part" head ", hold" animal ", obviously no DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSo that DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFATherefore, a generalization operation is performed to generalize DEFA"human" defines "AnimalHuman animal" for its higher-order concept meaning, when DEF is presentB∈DEFSETBContains the attribute "whole" and has a value of "{ AnimalHuman | animals } }", so that the words "person" and "brain sack" have an integral part relationship.
For the word pairs which can not be identified by Chinese knowledge resources, the semantic relation identification operation is completed through the steps 5-1) to 6-4), and the following words A 'seasoning' and B 'vinegar' are taken as examples for explanation:
according to the step 5-1), the words A "seasoning" and B "vinegar" are translated into corresponding English sets by utilizing Iciba Chinese-English dictionary and HowNet, respectively, so as to obtain ENSET (A) { "access food", "conditions", "search" } and ENSET (B) { "vine", "jealousy" }.
Extracting a word EN from the English knowledge resource BabelNet according to the step 6-1)AThe antisense word of "access food" is gathered
Figure BDA0001381695150000111
Extraction of ENAThe antisense word of "conditions" is assembled
Figure BDA0001381695150000112
Extraction of ENAThe antisense word of "seasoning" is grouped together
Figure BDA0001381695150000113
Obvious words and phrases
Figure BDA0001381695150000114
Thus, step 6-2 is performed.
Extracting a word EN from the English knowledge resource BabelNet according to the step 6-2)APartial word set of "access food
Figure BDA0001381695150000115
Extract the word ENAPartial words of "conditions" are grouped together
Figure BDA0001381695150000116
Extract the word ENAPartial words of "seasoning" are collected
Figure BDA0001381695150000117
Extraction of ENBThe partial words of "vinegar" are collected
Figure BDA0001381695150000118
Extraction of ENBThe partial words of "just" are collected
Figure BDA0001381695150000119
It is obvious that
Figure BDA00013816951500001110
Figure BDA00013816951500001111
So go to step 6-3).
Extracting the term EN from the English knowledge resource BabelNet according to the step 6-3)AThe synonyms of "access food" are collected
Figure BDA00013816951500001112
Extract the word ENASynonyms of "conditions" are aggregated
Figure BDA00013816951500001113
Extract the word ENASynonyms of "seasoning" are grouped together as ENSSETA{ "flavour", "seaspring" }, due to the words
Figure BDA00013816951500001115
So transfer to step 6-4);
extracting the term EN from the English knowledge resource BabelNet according to the step 6-4)AThe lower-level word of "access food
Figure BDA00013816951500001114
Extract the word ENAThe suffix of "conditions" is ENHSETA{ "relish", "dip", "mustard", "table mustard", "task", "button up", "button up", "child show", "chunk", "inductive repeat", "step show", "taco show", "salt", "minute show", "creating show", "reduce show", "hostin show", "hoseradist", "marinadate", "source", "video", "environment", "monitor", "spectrum", "pass", "wasabi" }, when EN is satisfiedB“vinegar”∈ENHSETASo English word ENAAnd ENBThere is a superior-inferior relation, i.e. there is a superior-inferior relation for the original Chinese word pair.
Through the above operation steps, the semantic relation recognition work of the given word pair can be completed.
Correspondingly, the embodiment of the invention also provides a device for identifying the semantic relation of the Chinese words by combining Chinese and English knowledge resources, and the structural schematic diagram of the device is shown in FIG. 2.
In this embodiment, the apparatus comprises:
an antisense relation recognition unit 201, configured to obtain an antisense word set using multiple chinese knowledge resources, and determine whether a semantic relation between words has an antisense relation according to the antisense word set;
an integral part relation recognition unit 202, configured to extract a partial word set using multiple chinese knowledge resources, and determine whether there is an integral part relation between words according to the partial word set;
the synonymy relation recognition unit 203 is used for extracting a synonymy set by utilizing various Chinese knowledge resources and judging whether synonymy relations exist among the words or not based on the synonymy set;
a superior-inferior relation identification unit 204, configured to extract a subordinate word set by means of multiple chinese knowledge resources, and determine whether there is a superior-inferior relation between words according to the subordinate word set;
a Chinese-English translation unit 205 for translating the Chinese word pairs into English using a Chinese-English dictionary;
the english word semantic relation recognition unit 206 is configured to perform word semantic relation recognition on the english word pair obtained by the chinese-english translation unit by using an english knowledge resource to determine a semantic relation of the original chinese word pair.
The schematic structure diagram of the antisense relation recognition unit 201 of the device shown in fig. 2 is shown in fig. 3, and it includes:
a HowNet antisense relation recognition unit 301 for performing a set of antisense words ASET of word a for given words a and B using the explicitly defined antisense relation in HowNetAExtracting if B is equal to ASETAIf the two words have antisense relation, otherwise, the Chinese antisense relation recognition unit is turned to Baidu, and in addition, the sense relation defined in HowNet is also used as an antisense relation;
a Baidu Chinese antisense relation identifying unit 302 for extracting an antisense word set ASET of a given word A using Baidu ChineseAExtracting word A by using expansion version of great synonym forest of HaughSynonym set SSETAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to an encyclopedia antisense relation recognition unit;
an Baidu encyclopedia antisense relation recognition unit 303 for extracting an antisense word set ASET of the word A using Baidu encyclopediaAIf the word B belongs to ASETAIf the two words have antisense relation, otherwise, the two words are converted into the whole part relation identification unit.
Fig. 4 is a schematic structural diagram of the whole part relationship identification unit 202 of the apparatus shown in fig. 2, which includes:
a HowNet integral part relation recognition unit 401, configured to extract part of word sets MSET of the words a and B by using HowNetAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf the two words have the integral part relationship, otherwise, the original definition integral part relationship identification unit is defined;
an ambiguities definition global part relation recognition unit 402 for processing using HowNet ambiguities, a word in a definition containing an ambiguities "part" representing the word as a part word (part) of a word, the value of the "while" attribute in the definition indicating the ambiguities definition of its global word, from which the ambiguities definition sets DEFSET of words a and B are extractedAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf the words A and B have an integral partial relationship, otherwise, the words A and B are converted into a synonymy relationship identification unit;
in addition, in the definition-based whole part relationship recognition unit, some words may be processed in a generalization manner so that the whole part relationship cannot be effectively recognized by directly using the definition, and the value of the "whole" attribute in the above description is generalized to the upper concept thereof, and the rest of the operations are unchanged.
Fig. 5 shows a schematic structural diagram of the synonymy relationship identification unit 203 of the apparatus shown in fig. 2, which includes:
a synonym relation identifying unit 501 for obtaining a synonym set SSET of the word a according to the synonym represented by the row labeled "═ in the expansion version of the large synonym forest of hayageAIf B ∈ SSETAIf yes, the words A and B have a synonymy relationship, otherwise, the HowNet synonymy relationship recognition unit is switched;
a HowNet synonymy relation identification unit 502, configured to extract a synonymy set SSET of the word A using HowNetAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a hundred-degree Chinese synonymy relationship identification unit;
a Baidu Chinese synonymy relation identification unit 503 for extracting a synonym set SSET of the word a using Baidu ChineseAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a Baidu encyclopedia synonymy relationship identification unit;
an encyclopedia synonymy relationship identification unit 504, configured to obtain encyclopedia link page sets PSET for the words a and B according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure BDA0001381695150000133
The words A and B have a synonymy relationship, otherwise, the upper and lower relationship identification units are switched.
Fig. 6 shows a schematic structural diagram of the superior-inferior relation identification unit 204 of the apparatus shown in fig. 2, which includes:
a lower relation recognition unit 601 of HowNet, configured to extract lower word sets HSET of words A and B by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have upper and lower relations, otherwise, the words A and B are transferred to the original definition of the upper and lower relation identification unit;
a definition context identification unit 602, configured to extract definition sets DEFSET of words a and B according to the context implied by the HowNet definitionAAnd DEFSETBIf DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfy the primary and primary sense consistency and
Figure BDA0001381695150000131
or
Figure BDA0001381695150000132
Then words a and B have an up-down relationship.
Fig. 7 shows a schematic structure diagram of the chinese-to-english translation unit 205 of the apparatus shown in fig. 2, which includes:
the Chinese-English translation unit 701 is used for translating the words A and B into corresponding English sets ENSET (A) and ENSET (B) respectively by using a Chinese-English dictionary.
Fig. 8 is a schematic structural diagram of the english word semantic relation identifying unit 206 of the apparatus shown in fig. 2, which includes:
an english antisense relation identifying unit 801 for identifying each english word ENA∈ENSET(A),ENBBelongs to ENSET (B), and extracts the word EN according to English knowledge resourcesASet of antisense words ENASETAIf the word ENB∈ENASETAThen English word ENAAnd ENBThe anti-sense relation exists, namely the anti-sense relation exists in the original Chinese word pair, otherwise, the English whole part relation identification unit is converted;
an English whole part relation recognition unit 802 for recognizing each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and respectively extracts words EN according to English knowledge resourcesAAnd ENBPart of word set ENMSETAAnd ENMSETBIf the word ENB∈ENMSETAOr ENA∈ENMSETBThen English word ENAAnd ENBThe integral part relation exists, namely the original Chinese word pair has the integral part relation, otherwise, the English synonymy relation is converted into an English synonymy relation identification unit;
an english synonymy relation identification unit 803 for identifying each english term ENA∈ENSET(A),ENBBelongs to ENSET (B), and extracts the word EN according to English knowledge resourcesASynonym set ENSSETAIf the word isENB∈ENSSETAThen English word ENAAnd ENBThe synonymy relation exists, namely the synonymy relation exists in the original Chinese word pair, otherwise, the English upper and lower relation identification unit is converted;
an English context identifying unit 804 for identifying each English word ENA∈ENSET(A),ENBBelongs to ENSET (B), and respectively extracts words EN according to English knowledge resourcesAAnd ENBLower-level word set ENHSETAAnd ENHSETBIf the word ENB∈ENHSETAOr ENA∈ENHSETBThen English word ENAAnd ENBThere is a superior-inferior relation, i.e. there is a superior-inferior relation for the original Chinese word pair.
The Chinese word semantic relation recognition device combining Chinese and English knowledge resources shown in fig. 2 to 8 can be integrated into various hardware entities. For example, a Chinese term semantic relation recognition device that combines Chinese and English knowledge resources can be integrated into: personal computers, smart phones, workstations, and the like.
The method for recognizing semantic relations of Chinese words by combining Chinese and English knowledge resources, which is provided by the embodiment of the invention, can be stored on various storage media in a storage mode of instruction or instruction set storage. Such storage media include, but are not limited to: floppy disk, optical disk, hard disk, memory, U disk, CF card, SM card, etc.
In summary, in the embodiment of the present invention, the antisense word set is obtained by combining multiple chinese knowledge resources, and whether the semantic relationship between words has an antisense relationship is determined according to the antisense word set; extracting a partial word set by using various Chinese knowledge resources, and judging whether integral partial relations exist among words or not according to the partial word set; extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among the words or not based on the synonym set; extracting a hyponym set by means of multiple Chinese knowledge resources, and judging whether the words have a superior-subordinate relationship according to the hyponym set; translating the Chinese word pair into English by using a Chinese-English dictionary; and performing word semantic relation recognition on English word pairs obtained by translating Chinese and English by using English knowledge resources to determine the semantic relation of the original Chinese word pairs. Therefore, after the embodiment of the invention is applied, the Chinese word semantic relation recognition combining Chinese and English knowledge resources is realized. The implementation mode of the invention can utilize various different Chinese knowledge resources to identify the semantic relation of the words, and fully utilizes each knowledge resource; in the whole part identification process, aiming at the characteristics of the definition of the HowNet sememe, the method is supplemented by a generalization method, so that the adaptability of the identification method is improved; in the process of identifying the upper and lower relations, the invention fully excavates the information contained in the definition in HowNet, thereby effectively improving the accuracy of identification; the invention combines Chinese and English knowledge resources, and utilizes the English knowledge resources to supplement word semantic relations which are not covered by the Chinese knowledge resources, thereby improving the recognition rate; the Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources, provided by the invention, can automatically recognize the semantic relation of a given word pair, including antisense relation, integral part relation, synonymous relation and superior-inferior relation, and has higher recognition accuracy.
The embodiments in this specification are described in a progressive manner, and like parts may be referred to each other. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points.
The foregoing detailed description of the embodiments of the present invention has been presented for purposes of illustration and description and is intended to be exemplary only of the method and apparatus for practicing the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and therefore the present specification should not be construed as limiting the present invention.

Claims (5)

1. A Chinese word semantic relation recognition method combining Chinese and English knowledge resources is characterized by comprising the following steps:
acquiring an antisense word set by combining various Chinese knowledge resources, and judging whether the semantic relation between words has an antisense relation or not according to the antisense word set;
step 1-1) performing an antisense word set ASET of a word A for given words A and B by using an explicitly defined antisense relation in HowNetAExtracting if B is equal to ASETAIf the words A and B have antisense relation, otherwise, turning to step 1-2), otherwise, the sense relation defined in HowNet is also treated as an antisense relation;
step 1-2) extraction of antisense word set ASET of given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to the step 1-3);
step 1-3) extracting the antisense word set ASET of the word A by using Baidu encyclopediaAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to step 2-1);
extracting a partial word set by using various Chinese knowledge resources, and judging whether integral partial relations exist among the words or not according to the partial word set;
step 2-1) extracting partial word sets MSET of words A and B respectively by using HowNetAAnd MSETBIf B ∈ MSETAOr A ∈ MSETBIf so, the words A and B have integral part relation, otherwise, turning to step 2-2);
step 2-2) processing using HowNet semantic definitions, in which a word containing the semantic "part" indicates that the word is a partial word or part word of a certain word, and the value of the "while" attribute in the definition indicates the semantic definition of its whole word, from which the semantic definition set DEFSET of all the word definitions of words A and B is extractedAAnd DEFSETB(ii) a Using DEFAAnd DEFBA definition of an semantic meaning that refers to words A and B, respectively; if DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContaining the genus "wholeAnd has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf so, the words A and B have an integral part relationship, otherwise, turning to the step 3-1);
in addition, some words cannot effectively identify the whole part relationship by directly utilizing definition, can be processed in a generalization mode, generalizes the value of the 'window' attribute into the upper concept thereof, and keeps the rest operations unchanged;
thirdly, extracting a synonym set by utilizing various Chinese knowledge resources, and judging whether synonym relations exist among the words or not based on the synonym set;
extracting a hyponym set by means of multiple Chinese knowledge resources, and judging whether the words have superior-subordinate relations or not according to the hyponym set;
step five, translating the Chinese word pair into English by using a Chinese-English dictionary;
step 5-1) translating the words A and B into corresponding English sets ENSET by utilizing a Chinese-English dictionaryAAnd ENSETB
Sixthly, performing word semantic relation recognition on the English word pair obtained in the step five by using English knowledge resources to determine the semantic relation of the original Chinese word pair;
step 6-1) for each English word ENA∈ENSETA,ENB∈ENSETBExtracting the word EN according to English knowledge resourcesASet of antisense words ENASETAIf the word ENB∈ENASETAThen English word ENAAnd ENBAn antisense relation exists, namely the Chinese words A and B in the step 5-1) have the antisense relation, otherwise, the step 6-2 is carried out;
step 6-2) for each English word ENA∈ENSETA,ENB∈ENSETBRespectively extracting words EN according to English knowledge resourcesAAnd ENBPart of word set ENMSETAAnd ENMSETBIf the word ENB∈ENMSETAOr ENA∈ENMSETBThen English word ENAAnd ENBExist as a wholePartial relation, namely integral partial relation of the Chinese words A and B in the step 5-1), otherwise, turning to the step 6-3);
step 6-3) for each English word ENA∈ENSETA,ENB∈ENSETBExtracting the word EN according to English knowledge resourcesASynonym set ENSSETAIf the word ENB∈ENSSETAThen English word ENAAnd ENBA synonymy relationship exists, namely the Chinese words A and B in the step 5-1) have the synonymy relationship, otherwise, the step 6-4) is carried out;
step 6-4) for each English word ENA∈ENSETA,ENB∈ENSETBRespectively extracting words EN according to English knowledge resourcesAAnd ENBLower-level word set ENHSETAAnd ENHSETBIf the word ENB∈ENHSETAOr ENA∈ENHSETBThen English word ENAAnd ENBThe upper and lower relations exist, namely the Chinese words A and B in the step 5-1) exist.
2. The method for recognizing semantic relations of Chinese words and phrases in combination with Chinese and English knowledge resources according to claim 1, wherein in the third step, when determining the synonymous relations, the method specifically comprises:
step 3-1) representing synonyms according to the row marked with ═ in the expansion edition of the Harmony large synonym forest, and acquiring the synonym set SSET of the word AAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, turning to the step 3-2);
step 3-2) extracting synonym set SSET of the word A by utilizing HowNetAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, turning to the step 3-3);
step 3-3) extracting synonym set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAIf yes, the words A and B have a synonymy relation, otherwise, turning to the step 3-4);
step 3-4) linking pages according to Baidu encyclopediaObtaining encyclopedia link page set PSET of words A and B respectivelyAAnd PSETBIf it is satisfied
Figure FDA0002822602980000021
Then the words a and B have a synonymy relationship, otherwise go to step 4-1).
3. The method for recognizing semantic relations of Chinese words and phrases in combination with Chinese and English knowledge resources according to claim 1, wherein in the fourth step, when determining the upper and lower relations, the following steps are specifically performed:
step 4-1) extracting hyponym sets HSET of words A and B respectively by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf so, the words A and B have an upper-lower relation, otherwise, turning to the step 4-2);
step 4-2) extracting the definition sets DEFSET of all word definitions of the words A and B respectively according to the upper and lower relations contained in the HowNet definitionAAnd DEFSETB(ii) a Using DEFAAnd DEFBA definition of an semantic meaning that refers to words A and B, respectively; if DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBDEF satisfying primary sense coincidenceAThe collection of contained atoms can be considered DEFBProper subset of the set of contained atoms or DEFBThe collection of contained atoms can be considered DEFAA proper subset of the set of comprised sememes, then the words a and B have an up-down relationship.
4. A Chinese word semantic relation recognition device combined with Chinese and English knowledge resources is characterized by comprising an antisense relation recognition unit, an integral part relation recognition unit, a synonym relation recognition unit and an upper and lower relation recognition unit, wherein:
the antisense relation identification unit is used for acquiring an antisense word set by using various Chinese knowledge resources and judging whether the semantic relation among the words has an antisense relation or not according to the antisense word set;
the integral part relation recognition unit is used for extracting a part word set by using various Chinese knowledge resources and judging whether integral part relations exist among the words or not according to the part word set;
the synonymy relation identification unit is used for extracting a synonymy set by utilizing various Chinese knowledge resources and judging whether synonymy relations exist among the words or not based on the synonymy set;
the upper and lower relation identification unit is used for extracting a lower word set by means of various Chinese knowledge resources and judging whether the upper and lower relations exist among the words or not according to the lower word set;
the Chinese-English translation unit is used for translating the Chinese word pair into English by using a Chinese-English dictionary;
and the English word semantic relation recognition unit is used for recognizing the word semantic relation of the English word pair obtained by the Chinese-English translation unit by using English knowledge resources so as to determine the semantic relation of the original Chinese word pair.
5. The apparatus for recognizing semantic relation of Chinese words in combination with Chinese and English knowledge resources according to claim 4, comprising:
a HowNet antisense relation recognition unit for performing an ASET set of words A on given words A and B by using the antisense relation explicitly defined in HowNetAExtracting if B is equal to ASETAIf the words A and B have antisense relation, otherwise, turning to a Baidu Chinese antisense relation recognition unit, and otherwise, taking the sense relation defined in HowNet as an antisense relation;
baidu Chinese antisense relation recognition unit for extracting set of antisense words ASET of given word A using Baidu ChineseAExtracting synonym set SSET of word A by using expansion version of Hagong big synonym forestAFor each word W ∈ SSETAExtracting the antisense words and incorporating them into ASETAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, turning to an encyclopedia antisense relation recognition unit;
baidu encyclopedia antisense relation recognition unit for use with Baidu encyclopediaAntisense word set ASET of family extraction word AAIf the word B belongs to ASETAIf the words A and B have antisense relation, otherwise, the words A and B are converted into a whole part relation identification unit;
a HowNet integral part relation recognition unit for respectively extracting part word sets MSET of the words A and B by using HowNetA-And MSETBIf B ∈ MSETAOr A ∈ MSETBIf the words A and B have integral part relationship, otherwise, the original definition integral part relationship identification unit is defined;
a definition integral part relation recognition unit for processing using HowNet definition, wherein a word containing definition 'part' indicates the word as a part word or part word of a word, and the value of 'while' attribute in the definition indicates the definition of the whole word, and a definition DEFSET set is used for extracting all the definitions of words A and BAAnd DEFSETB(ii) a Using DEFAAnd DEFBA definition of an semantic meaning that refers to words A and B, respectively; if DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBSatisfies DEFAContains the attribute "whole" and has a value of DEFBOr DEFBContains the attribute "whole" and has a value of DEFAIf the words A and B have an integral partial relationship, otherwise, the words A and B are converted into a synonymy relationship identification unit;
in addition, in the definition whole part relation recognition unit, some words can not effectively recognize the whole part relation by directly utilizing definition, and can be processed in a generalization mode, the value of the middle 'whole' attribute is generalized to the upper concept, and other operations are unchanged;
a synonym relation identification unit of the word forest, which is used for obtaining the synonym set SSET of the word A according to the line marked with ═ in the expansion version of the large synonym forest of the HaughAIf B ∈ SSETAIf not, the HowNet synonymy relation identification unit is switched to;
a HowNet synonymy relation recognition unit for extracting the synonymy set of the word A by utilizing HowNetSSETAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a hundred-degree Chinese synonymy relationship identification unit;
a Baidu Chinese synonymy relation recognition unit for extracting the synonymy set SSET of the word A by utilizing Baidu ChineseAIf the word B ∈ SSETAIf the words A and B have a synonymy relationship, otherwise, turning to a hundred-degree encyclopedia synonymy relationship identification unit;
an encyclopedia synonymy relationship identification unit used for respectively acquiring encyclopedia link page sets PSET of the words A and B according to the page links of the encyclopediaAAnd PSETBIf it is satisfied
Figure FDA0002822602980000041
If the words A and B have a synonymy relationship, otherwise, turning to an upper and lower relationship identification unit;
a lower-level relation recognition unit of HowNet for respectively extracting lower-level word sets HSET of words A and B by using HowNetAAnd HSETBIf B ∈ HSETAOr A is epsilon to HSETBIf the words A and B have upper and lower relations, otherwise, the original upper and lower relation identification unit is defined;
a definition upper and lower relation identification unit for extracting definition sets DEFSET of all word definitions of the words A and B according to the upper and lower relation implied by HowNet definitionAAnd DEFSETB(ii) a Using DEFAAnd DEFBA definition of an semantic meaning that refers to words A and B, respectively; if DEF is presentA∈DEFSETAAnd DEFB∈DEFSETBDEF satisfying primary sense coincidenceAThe collection of contained atoms can be considered DEFBProper subset of the set of contained atoms or DEFBThe collection of contained atoms can be considered DEFAIf the words A and B have upper and lower order relations, the proper subset of the contained set of the sememes is selected;
a Chinese-English translation unit for translating the words A and B into corresponding English sets ENSET respectively by using a Chinese-English dictionaryAAnd ENSETB;
An English antisense relation identification unit for identifying each English word ENA∈ENSETA,ENB∈ENSETBExtracting the word EN according to English knowledge resourcesASet of antisense words ENASETAIf the word ENB∈ENASETAThen English word ENAAnd ENBAn antisense relation exists, namely the Chinese words A and B have the antisense relation, otherwise, the Chinese words A and B are converted into an English whole part relation identification unit;
an English integral part relation identification unit for identifying each English word ENA∈ENSETA,ENB∈ENSETBRespectively extracting words EN according to English knowledge resourcesAAnd ENBPart of word set ENMSETAAnd ENMSETBIf the word ENB∈ENMSETAOr ENA∈ENMSETBThen English word ENAAnd ENBAn integral part relation exists, namely the Chinese words A and B have an integral part relation, otherwise, the Chinese words A and B are converted into an English synonymy relation identification unit;
an English synonymy relation identification unit for identifying each English word ENA∈ENSETA,ENB∈ENSETBExtracting the word EN according to English knowledge resourcesASynonym set ENSSETAIf the word ENB∈ENSSETAThen English word ENAAnd ENBThe same sense relation exists, namely the Chinese words A and B have the same sense relation, otherwise, the English upper and lower relation identification unit is switched;
an English upper and lower position relation identification unit for identifying each English word ENA∈ENSETA,ENB∈ENSETBRespectively extracting words EN according to English knowledge resourcesAAnd ENBLower-level word set ENHSETAAnd ENHSETBIf the word ENB∈ENHSETAOr ENA∈ENHSETBThen English word ENAAnd ENBThere is a top-bottom relationship, that is, the Chinese words A and B have a top-bottom relationship.
CN201710706832.9A 2017-08-17 2017-08-17 Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources Active CN107451130B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710706832.9A CN107451130B (en) 2017-08-17 2017-08-17 Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710706832.9A CN107451130B (en) 2017-08-17 2017-08-17 Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources

Publications (2)

Publication Number Publication Date
CN107451130A CN107451130A (en) 2017-12-08
CN107451130B true CN107451130B (en) 2021-04-02

Family

ID=60492720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710706832.9A Active CN107451130B (en) 2017-08-17 2017-08-17 Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources

Country Status (1)

Country Link
CN (1) CN107451130B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902673A (en) * 2019-01-28 2019-06-18 北京明略软件系统有限公司 Table Header information identification and method for sorting, system, terminal and storage medium in table

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2280159A1 (en) * 2009-07-31 2011-02-02 International Engine Intellectual Property Company, LLC. Exhaust gas cooler
CN103473222A (en) * 2013-09-16 2013-12-25 中央民族大学 Semantic ontology creation and vocabulary expansion method for Tibetan language
CN104484411A (en) * 2014-12-16 2015-04-01 中国科学院自动化研究所 Building method for semantic knowledge base based on a dictionary
CN106202034A (en) * 2016-06-29 2016-12-07 齐鲁工业大学 A kind of adjective word sense disambiguation method based on interdependent constraint and knowledge and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2280159A1 (en) * 2009-07-31 2011-02-02 International Engine Intellectual Property Company, LLC. Exhaust gas cooler
CN103473222A (en) * 2013-09-16 2013-12-25 中央民族大学 Semantic ontology creation and vocabulary expansion method for Tibetan language
CN104484411A (en) * 2014-12-16 2015-04-01 中国科学院自动化研究所 Building method for semantic knowledge base based on a dictionary
CN106202034A (en) * 2016-06-29 2016-12-07 齐鲁工业大学 A kind of adjective word sense disambiguation method based on interdependent constraint and knowledge and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NLPCC 2016 Shared Task Chinese Words Similarity Measure via Ensemble Learning Based on Multiple Resources;Shutian Ma等;《NLPCC 2016: Natural Language Understanding and Intelligent Applications》;20161202;第862-869页 *
SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals;Iris Hendrickx等;《SEW "09: Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions》;20090630;第94-99页 *
基于词典和Web的词汇关系抽取;范庆虎等;《http://www.doc88.com/p-1146077617476.html》;20150110;第1-9页 *

Also Published As

Publication number Publication date
CN107451130A (en) 2017-12-08

Similar Documents

Publication Publication Date Title
Kashani et al. A data mining approach to investigate the factors influencing the crash severity of motorcycle pillion passengers
Van Mierlo et al. Driving style and traffic measures-influence on vehicle emissions and fuel consumption
Pan et al. Study of typical electric two‐wheelers pre-crash scenarios using K-medoids clustering methodology based on video recordings in China
CN103413359A (en) Bad driving behavior analysis evaluation system
CN106845453A (en) Taillight detection and recognition methods based on image
CN107451130B (en) Chinese word semantic relation recognition method and device combining Chinese and English knowledge resources
CN110304068A (en) Acquisition method, device, equipment and the storage medium of running car environmental information
CN107451123B (en) Chinese word semantic relation recognition method and device based on multiple Chinese knowledge resources
Coxon et al. Urban mobility design
Wang et al. Real-time vehicle target detection in inclement weather conditions based on YOLOv4
Mrazovac et al. Human-centric role in self-driving vehicles: Can human driving perception change the flavor of safety features?
Schwedes et al. The Electric Car
Li et al. YOLO V5-MAX: A Multi-object Detection Algorithm in Complex Scenes
CN113886523A (en) Big data-based data fusion computing technology
Tang et al. The development of generalized public bicycles in China and its role in the urban transportation system
Li et al. Generating community road network from GPS trajectories via style transfer
Ruan et al. Vehicle detection based on wheel part detection
Williams Motoring: Swanky, safe, and more affordable
Paranjape et al. DATS_2022: A versatile indian dataset for object detection in unstructured traffic conditions
CN113449589B (en) Method for calculating driving strategy of unmanned vehicle in urban traffic scene
Скуйбедина Фоновые знания языка в национально-маркированных формах передвижения (на примере видов транспорта в Британии)
Huang et al. Object Detection Based on Multi-Source Information Fusion in Different Traffic Scenes
Pawar et al. Real-Time Detection of Vehicles on South Asian Roads
Daley et al. Detecting road user mode of transportation using deep learning to enhance VRU safety in the C-ITS environment
US20230097860A1 (en) Device and computer implemented method for explainable scene clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant