CN107832290B

CN107832290B - Method and device for identifying Chinese semantic relation

Info

Publication number: CN107832290B
Application number: CN201710980063.1A
Authority: CN
Inventors: 李长亮; 马腾; 程健
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2017-10-19
Filing date: 2017-10-19
Publication date: 2020-02-28
Anticipated expiration: 2037-10-19
Also published as: CN107832290A

Abstract

The invention relates to the field of natural language processing, in particular to a method and a device for identifying Chinese semantic relations, and aims to solve the problem of low accuracy of Chinese semantic relation identification. To this end, the Chinese semantic relationship recognition method of the invention comprises the following steps: step 1, judging whether a preset semantic dictionary contains Chinese word pairs to be detected or not: if yes, determining the semantic relation of the Chinese word pair to be detected according to the preset semantic dictionary, and if not, performing the step 2; step 2, acquiring a first semantic relation of the word vectors of the Chinese word pairs to be detected by using the word vectors; and 3, acquiring a second semantic relationship of the Chinese word pair to be detected by using the word structure characteristics of the Chinese word pair to be detected, and adjusting the first semantic relationship according to the second semantic relationship to obtain a final semantic relationship. By the method, the words can be considered from multiple dimensions, and the semantic relation of the Chinese words can be efficiently, quickly and accurately identified.

Description

Method and device for identifying Chinese semantic relation

Technical Field

The invention relates to the technical field of natural languages, in particular to a method and a device for recognizing Chinese semantic relations.

Background

With the development of the big data era and the technological breakthrough in deep learning, Natural Language Processing (NLP) gradually becomes a hotspot in the fields of computer application and artificial intelligence research, and the automatic recognition of semantic relationships is a huge challenge in the field of natural language Processing. When vocabulary learning is performed, words having different semantic relationships need to be distinguished.

The semantic relation recognition technology mainly comprises a semantic recognition method based on a semantic dictionary and a semantic recognition method based on a word vector. The semantic recognition method based on the semantic dictionary, like the semantic word forest, the HowNet and the like, has the advantages of clear classification and a fast classification process, but the method depends on the construction of the semantic dictionary, consumes a large amount of manpower, material resources and financial resources, is difficult to update in a later period, and cannot process words outside the semantic dictionary.

The semantic recognition method based on the word vector converts the vocabulary semantics into the word vector through a natural language model, and then recognizes the Chinese semantic relation through calculation among the word vectors or a model established based on the word vector.

Disclosure of Invention

In order to solve the above problems in the prior art, namely to solve the technical problem of low recognition accuracy of the Chinese semantic relationship, the invention provides a method and a device for recognizing the Chinese semantic relationship.

In a first aspect, a method for identifying a chinese semantic relationship in the present invention includes:

step 1, judging whether a preset semantic dictionary contains Chinese word pairs to be detected or not: if yes, determining the semantic relation of the Chinese word pair to be detected according to the preset semantic dictionary, and if not, performing the step 2;

step 2, acquiring a first semantic relation of the word vectors of the Chinese word pairs to be detected by using the word vectors;

and 3, acquiring a second semantic relationship of the Chinese word pair to be detected by using the word structure characteristics of the Chinese word pair to be detected, and adjusting the first semantic relationship according to the second semantic relationship to obtain a final semantic relationship.

Preferably, the step of obtaining the first semantic relationship by using the word vector of the word pair of the chinese word to be detected specifically includes:

acquiring word vectors of Chinese word pairs to be detected according to a preset word vector dictionary;

extracting the characteristics of the word vector, and constructing a combined vector according to the characteristics of the word vector and the part-of-speech information of the Chinese word pair to be detected;

and acquiring the probability of each preset semantic relationship corresponding to the combined vector by using a preset softmax classification model, and taking the semantic relationship with the maximum probability value as the first semantic relationship of the Chinese word pair to be detected.

Preferably, the step of "extracting the features of the word vector and constructing a combined vector according to the features of the word vector and the part-of-speech information of the word pair of the Chinese words to be detected" specifically includes:

calculating the similarity of the Chinese word to be detected to the corresponding word vector;

calculating a difference vector of the word vector corresponding to the Chinese word to be detected;

acquiring the part of speech of a Chinese word to be detected, and coding the part of speech to obtain corresponding part of speech information;

and fusing the similarity and the difference vector of the word vector and the part-of-speech information of the word pair of the Chinese words to be detected to form a combined vector.

Preferably, the step of "adjusting the first semantic relationship according to the second semantic relationship to obtain the final semantic relationship" specifically includes:

judging whether the first semantic relation is consistent with the second semantic relation or not, and if so, not adjusting the first semantic relation; and if the two semantic relations are inconsistent, taking the semantic relation with the second highest probability in the preset semantic relations corresponding to the combined vector as the first semantic relation of the word pair of the Chinese words to be detected.

Preferably, the word structure characteristics comprise part-of-speech characteristics, special character characteristics, structure characteristics and single character characteristics; the step of acquiring the second semantic relationship by using the word structure characteristics of the Chinese word pair to be detected specifically comprises the following steps:

judging whether the Chinese word pair to be detected meets a preset part-of-speech judgment condition, if so, obtaining a second semantic relationship according to the preset part-of-speech judgment condition, and if not, judging whether the Chinese word pair to be detected contains a preset special Chinese character;

when the Chinese word pair to be detected contains a preset special Chinese character, deleting the preset special Chinese character, and judging whether the modified Chinese word pair to be detected meets a preset structure judgment condition, or else, directly judging whether the modified Chinese word pair to be detected meets the preset structure judgment condition;

when the Chinese word pair to be detected meets a preset structure judgment condition, obtaining a second semantic relation according to the preset structure judgment condition; when the Chinese word pair to be detected does not meet the preset structure judgment condition, judging whether the Chinese word pair to be detected meets the preset single word judgment condition;

when the Chinese word pair to be detected meets a preset single word judgment condition, obtaining a second semantic relation according to the preset single word judgment condition;

wherein the preset part-of-speech determination conditions include: if two Chinese words in the Chinese word pair to be detected are adjectives or verbs, the second semantic relation is an antisense word;

the preset structure judgment conditions include: if the structural characteristics of two Chinese words in the Chinese word pair to be detected are ab and bc respectively, or a and ba respectively, the two Chinese words are in integral and partial relation; if the structural characteristics of the two Chinese words are a and ac respectively, the two Chinese words are in an upper-level and lower-level relation; a. b and c are respectively single Chinese characters which are different from each other;

the preset single character judgment conditions comprise: and if the two Chinese words in the Chinese word pair to be detected are all single Chinese characters which are different from each other, the two Chinese words are antisense words.

In a second aspect, an apparatus for identifying a chinese semantic relationship in the present invention includes:

the judging module is configured to judge whether the preset semantic dictionary contains Chinese word pairs to be detected: if so, determining the semantic relation of the Chinese word pair to be detected according to the preset semantic dictionary;

the acquisition module is configured to acquire a first semantic relationship by using word vectors of the Chinese word pairs to be detected after the judgment module judges that the preset semantic dictionary does not contain the Chinese word pairs to be detected;

and the adjusting module is configured to acquire a second semantic relationship of the Chinese word pair to be detected by using the word structure characteristics of the Chinese word pair to be detected, and adjust the first semantic relationship according to the second semantic relationship to obtain a final semantic relationship.

Preferably, the obtaining module includes:

the word vector acquisition unit is configured to acquire word vectors of word pairs of Chinese words to be detected according to a preset word vector dictionary;

the extraction and construction unit is configured to extract the characteristics of the word vector and construct a combined vector according to the characteristics of the word vector and the part-of-speech information of the word pair of the Chinese words to be detected;

and the semantic relation obtaining unit is configured to obtain the probability of each preset semantic relation corresponding to the combined vector by using a preset softmax classification model, and use the semantic relation with the maximum probability value as the first semantic relation of the Chinese word pair to be detected.

Preferably, the extraction building unit comprises:

the similarity calculation operator unit is configured to calculate the similarity of the Chinese word to be detected to the corresponding word vector;

the difference vector calculation subunit is configured to calculate a difference vector of the word vector corresponding to the Chinese word to be detected;

the part-of-speech obtaining subunit is configured to obtain the part of speech of the Chinese word to be detected and encode the part of speech to obtain corresponding part-of-speech information;

and the fusion subunit is configured to fuse the similarity and the difference vector of the word vector and the part-of-speech information of the word pair of the Chinese words to be detected to form a combined vector.

Preferably, the adjusting module comprises a judging unit; the judging unit is configured to judge whether the first semantic relationship is consistent with the second semantic relationship, and if so, not to adjust the first semantic relationship; and if the two semantic relations are inconsistent, taking the semantic relation with the second highest probability in the preset semantic relations corresponding to the combined vector as the first semantic relation of the word pair of the Chinese words to be detected.

Preferably, the word structure characteristics comprise part-of-speech characteristics, special character characteristics, structure characteristics and single character characteristics; the adjustment module further comprises:

the part-of-speech judging unit is configured to judge whether the word pair of the Chinese words to be detected meets a preset part-of-speech judging condition, and if so, a second semantic relationship is obtained according to the preset part-of-speech judging condition;

a special character judging unit configured to judge whether the Chinese word to be detected contains a preset special Chinese character or not when the part of speech judging unit cannot obtain the second semantic relationship, delete the preset special Chinese character if the preset special Chinese character is contained, and input the modified Chinese word pair to be detected to the structure judging unit; if not, directly inputting the Chinese word pair to be detected to a structure judging unit;

the structure judging unit is configured to judge whether the Chinese word pair to be detected meets a preset structure judging condition when the part of speech judging unit cannot obtain the second semantic relationship, and obtain the second semantic relationship according to the preset structure judging condition when the Chinese word pair to be detected meets the preset structure judging condition;

the single word judging unit is configured to judge whether the Chinese word pair to be detected meets a preset single word judging condition when the structure judging unit cannot obtain the second semantic relationship, and obtain the second semantic relationship according to the preset single word judging condition when the Chinese word pair to be detected meets the preset single word judging condition;

Compared with the closest prior art, the technical scheme at least has the following beneficial effects:

1. in the method for identifying the Chinese semantic relationship, a Chinese semantic relationship classification system is constructed by adopting a mode of combining a semantic dictionary, a machine learning model and linguistics, and the semantic relationship of words is considered from three dimensions, so that the identification accuracy of the Chinese semantic relationship is greatly improved.

2. According to the method for identifying the Chinese semantic relationship, the words are classified according to the similarity degree, the difference degree and the part of speech angle between the words, so that the identification accuracy of the Chinese semantic relationship is improved.

Drawings

FIG. 1 is a flow chart illustrating the main steps of a method for recognizing Chinese semantic relationships according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the main steps of step S3 in the Chinese semantic relationship recognition method according to the embodiment of the present invention;

FIG. 3 is a schematic diagram of the main steps of step S4 in the Chinese semantic relationship recognition method according to the embodiment of the present invention;

FIG. 4 is a main flow of a method for recognizing Chinese semantic relationship of Chinese word pairs according to an embodiment of the present invention.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

The Chinese semantic relation recognition method integrates a semantic dictionary, a word vector-based model and linguistic knowledge, constructs the three into a Chinese word semantic relation recognition system, absorbs the advantages of the semantic dictionary, takes the word vector-based model as a theme, then integrates the linguistic knowledge and measures the Chinese word semantic relation in multiple angles.

The following describes a method for identifying a semantic relationship of chinese in an embodiment of the present invention with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 exemplarily shows main steps of a recognition method of a chinese semantic relationship. As shown in fig. 1, the method for identifying a chinese semantic relationship in this embodiment may include step S1, step S2, step S3 and step S4.

Step S1, determining whether the preset semantic dictionary contains the word pair of the chinese word to be detected: if so, go to step S2, otherwise go to step S3.

And step S2, determining the semantic relation of the Chinese word pair to be detected according to a preset semantic dictionary.

Specifically, the preset semantic dictionary in this embodiment includes a synonym forest and an antisense dictionary, where the synonym forest may be a synonym forest disclosed by the hayward information retrieval research center, and the antisense dictionary may be an antisense dictionary constructed based on common antisense words. In this embodiment, a dictionary program may be written according to a preset semantic dictionary to determine whether a given chinese word pair is included in the preset semantic dictionary, and if so, determine the semantic relationship of the chinese word pair.

And step S3, acquiring a first semantic relation of the word vectors of the Chinese word pairs to be detected.

Referring to fig. 2, fig. 2 exemplarily shows main steps of step S3 in the recognition method of chinese word pairs in the present embodiment, and as shown in fig. 2, step S3 in the recognition method of chinese semantic relations in the present embodiment includes step S31, step S32, and step S33.

And step S31, acquiring word vectors of the Chinese word pairs to be detected according to the preset word vector dictionary.

Further, constructing the preset word vector dictionary in the present embodiment may include step S311 and step S312.

In step S311, a word segmentation tool is used to segment the preset chinese corpus to obtain chinese vocabulary.

Specifically, in the present embodiment, a large-scale Baidu encyclopedia Chinese corpus with a Chinese corpus of 15.3G is preset, and a word segmentation tool, such as a Chinese segmentation tool, is used to perform Chinese word segmentation on the corpus to obtain a Chinese word.

Step S312, training the Chinese vocabulary by using a preset word vector tool to obtain a preset word vector dictionary.

Specifically, the preset Word vector tool in this embodiment may be a Word2vec tool, and a Word vector dictionary is obtained by training the chinese vocabulary obtained in step S311 using the Word2vec tool, where each Word corresponds to a Word vector.

And step S32, extracting the characteristics of the word vectors, and constructing a combined vector according to the characteristics of the word vectors and the part-of-speech information of the Chinese word pairs to be detected.

Further, step S32 in this embodiment mainly includes the following steps:

step S321, calculating the similarity of the Chinese word to be detected to the corresponding word vector.

Step S322, calculating the difference vector of the word vector corresponding to the Chinese word to be detected.

Step S323, the part of speech of the Chinese word to be detected is obtained, and the part of speech is coded to obtain corresponding part of speech information.

In step S324, the similarity and difference vectors of the word vectors and the part-of-speech information of the word pairs of the chinese words to be detected are fused to form a combined vector.

Specifically, in a preferred technical solution of this embodiment, for a given pair of chinese words a and B, the word vectors of the chinese words a and B are obtained through step S31, and are respectively denoted as w (a) and w (B), and w (a) and w (B)Performing principal component analysis on W (B), performing dimensionality reduction (PCA) on the W (B), recording vectors subjected to dimensionality reduction as W (A1) and W (B1), and calculating the similarity of W (A1) and W (B1) to obtain the similarity of Chinese words A and B and the similarity COS of the Chinese words A and B_SAs shown in the following formula (1):

the similarity between A and B is calculated by simply using the cosine similarity, and the similarity between each pair of words cannot be well described. In order to obtain accurate data of the similarity of Chinese words A and B, principal component dimensionality reduction is carried out on word vectors W (A) and W (B), a principal component analysis method utilizes a linear transformation thought to convert a plurality of indexes into a plurality of comprehensive indexes (principal components) on the premise of ensuring little information loss, and the principal components are not related to each other, so that the principal components have certain superior performance compared with original variables, and the system structure is simplified. Therefore, the similarity measurement of the Chinese words A and B is improved to a certain extent through the dimensionality reduction of the principal components.

The difference vector W between W (A) and W (B) is shown in the following formula (2):

W＝W(A)-W(B) (2)

the part-of-speech of the Chinese words A and B is obtained and respectively coded, wherein the coding mode of the Chinese words is as follows: if the part of speech of the Chinese word A is an adjective, P ₁1 is ═ 1; if the part of speech of the Chinese word A is a verb, P₁2; if the part of speech of the Chinese word A is noun, P₁3; if the part of speech of the Chinese word A is other parts of speech except adjectives, verbs and nouns, P is₁4; according to the coding mode, the coded Chinese words A and B are respectively marked as P₁And P₂The similarity and difference vector of the word vector and the part-of-speech information of the word pair of the Chinese word to be detected are fused, and a combined vector x formed by combining the similarity and difference vector of the word vector and the part-of-speech information of the word pair of the Chinese word to be detected is shown as the following formula (3):

x＝(W，COS_S，P₁，P₂) (3)

and step S33, acquiring the probability of each preset semantic relationship corresponding to the combined vector by using a preset softmax classification model, and taking the semantic relationship with the maximum probability value as the first semantic relationship of the Chinese word pair to be detected.

Specifically, a preset softmax classification model is constructed according to a deep learning convolutional neural network, 850 pairs of Chinese words can be selected in a network search mode to serve as a training set to train the softmax classification model, and 200 pairs of Chinese words can be selected from data issued by NLPCCstart 1 to serve as a verification set. Probability h of each preset semantic relation corresponding to combination vector x acquired by using preset softmax classification model_θ(xⁱ) Is shown in the following equation (4):

where i is 1,2, …, m, where m is the sample size, y is the classification label, k is the number of classification labels, and θ is (θ)₁,θ₂,…,θ_k) T is the transpose.

The output result of the preset softmax classification model is between 0 and 1, the sum of the probabilities of all Chinese word pairs and semantic relation categories is 1, and the semantic relation with the maximum probability value obtained through calculation of the formula (4) is used as the first semantic relation of the Chinese word pair to be detected.

In the method for identifying the Chinese semantic relationship, the semantic relationship is classified by adopting a softmax classifier of a machine learning algorithm, and in the process, the semantic relationship of Chinese words and phrases is considered from three dimensional characteristics, namely COS_SThe features describe the degree of similarity between words, the W features describe the degree of dissimilarity between words, P₁And P₂The words are classified from the part of speech perspective, so that the accuracy of the first semantic relationship obtained in the invention is greatly improved.

And step S4, acquiring a second semantic relationship of the Chinese word pair to be detected by using the word structure characteristics of the word pair, and adjusting the first semantic relationship according to the second semantic relationship to obtain a final semantic relationship.

Specifically, the word structure characteristics of the Chinese word pair mainly include a part-of-speech characteristic, a structure characteristic, a special character characteristic and a single character characteristic. The part-of-speech characteristics refer to the category of Chinese part-of-speech, the Chinese part-of-speech is mainly divided into nouns, verbs, adjectives, quantitative words, pronouns, adverbs and the like, the Chinese semantic relation is a rarely occurring noun of an antisense word, and the Chinese semantic relation belongs to verbs and adjectives more.

The structural characteristics refer to the structural characteristics of the Chinese word pair. According to the characteristics of Chinese words, such as ab, bc or a, and ba type word pairs, such as automobile and automobile tires, flowers and flower buds, the semantic relation of the Chinese word pairs is integral and partial in many cases. The form is a, ac type word pairs, such as flowers and chrysanthemum, the semantic relation of the word pairs is usually an upper-level relation and a lower-level relation, wherein a, b and c are all single Chinese characters which are different from each other.

The special character characteristics mean that the Chinese words contain characters without practical significance, and the special characters do not influence the semantic relation of the Chinese words. In the chinese words, there are many words that have no practical meaning, such as children of a chair. For this type of word, corresponding features can be extracted. Such as chairs and chair backs. Removing the seed can extract the structural features.

The word feature refers to a single Chinese character. According to the characteristics of Chinese vocabulary, the words of a single word are large and small, and in many cases, the words belong to antisense words.

Further, referring to fig. 3, fig. 3 exemplarily shows main steps of step S4 in the chinese semantic relationship recognition method according to the embodiment of the present invention, and as shown in fig. 3, step S4 includes step S41, step S42, step S43, step S44, and step S45.

Step S41, judging whether the Chinese word pair to be detected meets the preset part-of-speech judging condition, if so, obtaining a second semantic relation according to the preset part-of-speech judging condition, and if not, executing step S42.

Specifically, the parts of speech determination conditions preset in this embodiment include: and if the two Chinese words in the Chinese word pair to be detected are adjectives or verbs, the second semantic relation is an antisense word.

Step S42, judging whether the Chinese word pair to be detected contains a preset special Chinese character, if so, deleting the preset special Chinese character, and judging whether the modified Chinese word pair to be detected meets a preset structure judgment condition; otherwise, step S43 is executed.

Specifically, the preset special Chinese characters in this embodiment are characters that have no practical meaning in the Chinese word, and the special characters do not affect the semantic relationship of the Chinese word. If the modified Chinese word pair to be detected meets the preset structure judgment condition, obtaining a second semantic relation according to the preset structure judgment condition; when the modified detected chinese word pair does not satisfy the predetermined structural judgment condition, step S44 is executed.

Step S43, judging whether the Chinese word pair to be detected meets a preset structure judgment condition, and when the Chinese word pair to be detected meets the preset structure judgment condition, obtaining a second semantic relationship according to the preset structure judgment condition; when the detected chinese word pair does not satisfy the predetermined structural judgment condition, the step S44 is executed.

The preset structure determination conditions in this embodiment include: if the structural characteristics of two Chinese words in the Chinese word pair to be detected are ab and bc respectively, or a and ba respectively, the two Chinese words are in integral and partial relation; if the structural characteristics of the two Chinese words are a and ac respectively, the two Chinese words are in an upper-level and lower-level relation; a. b and c are each a single Chinese character different from each other. For example, the Chinese semantic relationship of the words "flower" and "chrysanthemum" of the a and ba classes is the upper and lower relationship; the Chinese semantic relationships for ab and bc-like words, such as "car" and "tire", are whole to part relationships.

And step S44, judging whether the Chinese word pair to be detected meets the preset single word judgment condition, and when the Chinese word pair to be detected meets the preset single word condition, obtaining a second semantic relationship according to the preset single word condition.

Specifically, the preset individual word condition in this embodiment includes that if two chinese words in the chinese word pair to be detected are all different single chinese characters, the two chinese words are antisense words, otherwise they are synonyms.

Step S44, judging whether the first semantic relation is consistent with the second semantic relation, if so, not adjusting the first semantic relation; and if the two semantic relations are inconsistent, the semantic relation with the second highest probability in the preset semantic relations corresponding to the combination vector is used as the first semantic relation for detecting the word pair of the Chinese words.

A preferred embodiment of the present invention will be described in detail below with reference to the accompanying drawings. Referring to FIG. 4, FIG. 4 illustrates the main flow of a Chinese semantic relationship recognition method for Chinese word pairs. In the implementation, data issued by the NLPCC2017task1 can be selected as a test set, as shown in fig. 4, firstly, whether the test set of the word pair of the Chinese words is in a preset semantic dictionary is judged in a dictionary area, and if so, the Chinese semantic relation corresponding to the test set of the word pair of the Chinese words is input; if not, entering a model area.

In this embodiment, word vectors corresponding to the Chinese word and word pair test set are first obtained according to a preset word vector dictionary, then word vector features are extracted and fused with part-of-speech information to form a combined vector, and a preset softmax classification model is used to obtain a first semantic relationship between each pair of Chinese word and word pairs in the Chinese word and word pair test set.

In the embodiment, firstly, entering a dictionary area, judging whether the Chinese words A and B are contained in a preset semantic dictionary according to the preset semantic dictionary, and if so, obtaining the semantic relation of the Chinese words A and B; if not, entering a model area.

In the embodiment, word vectors of Chinese words A and B are respectively obtained according to a preset word vector dictionary, then, the features of the word vectors of the Chinese words A and B are extracted, the features of the word vectors of the Chinese words A and B and the part-of-speech information are fused to obtain a combined vector, and a first semantic relation between the Chinese words A and B is obtained by using a preset softmax classification model.

In this embodiment, the part-of-speech characteristics, the special character characteristics, the structural characteristics, and the individual character characteristics of each pair of chinese word pairs in the chinese word pair test set are respectively extracted as the word structural characteristics in the word characteristic region, and the second semantic relationship of each chinese word pair is obtained.

In this embodiment, the second semantic relationship of each chinese word pair in the test set is used to adjust the first semantic relationship, and finally the semantic relationship of each chinese word pair is obtained.

The accuracy of the Chinese semantic relationship prediction of the invention can be obtained by comparing the prediction result of the semantic relationship of the words in the test set with the actual result, as shown in Table 1:

TABLE 1

Wherein, F1 value represents the harmonic value of P and R, F1 is 2PR/(P + R), P represents accuracy, and R represents recall.

The prediction results and the evaluation results fed back by the evaluation units are shown in the following table 2:

item	F1 value
		The method of the invention	0.859
Average value of each method	0.515
		Optimum value of each method	0.859

TABLE 2

Based on the same technical concept as the embodiment of the method for identifying the Chinese semantic relationship, the embodiment of the invention also provides a device for identifying the Chinese semantic relationship. The following describes the recognition device of the Chinese semantic relationship.

The device for identifying the Chinese semantic relationship in the implementation can comprise a judging module, an obtaining module and an adjusting module.

The judging module can be configured to judge whether a preset semantic dictionary contains Chinese word pairs to be detected: if yes, determining the semantic relation of the Chinese word pair to be detected according to a preset semantic dictionary.

The obtaining module may be configured to obtain the first semantic relationship by using a word vector of the word pair of the chinese word to be detected after the judging module judges that the preset semantic dictionary does not include the word pair of the chinese word to be detected.

The adjusting module can be configured to obtain the second semantic relationship by using the word structure characteristics of the Chinese word pair to be detected, and adjust the first semantic relationship according to the second semantic relationship to obtain the final semantic relationship.

Further, the obtaining module in this embodiment may include a word vector obtaining unit, an extraction and construction unit, and a semantic relationship obtaining unit.

The word vector obtaining unit may be configured to obtain word vectors of word pairs of the chinese words to be detected according to a preset word vector dictionary.

The extraction and construction unit can be configured to extract the characteristics of the word vectors and construct a combined vector according to the characteristics of the word vectors and the part-of-speech information of the Chinese word pairs to be detected.

The semantic relation obtaining unit can be configured to obtain the probability of each preset semantic relation corresponding to the combined vector by using a preset softmax classification model, and use the semantic relation with the maximum probability value as the first semantic relation of the Chinese word pair to be detected.

Further, in this embodiment, the extraction and construction unit may include a similarity calculation subunit, a difference vector calculation subunit, a part-of-speech acquisition subunit, and a fusion subunit.

The similarity calculation operator unit can be configured to calculate the similarity of the Chinese word to be detected to the corresponding word vector.

The difference vector calculating subunit may be configured to calculate a difference vector of the word vector corresponding to the word pair of the chinese word to be detected.

The part-of-speech obtaining subunit can be configured to obtain the part of speech of the Chinese word to be detected, and encode the part of speech to obtain corresponding part-of-speech information.

The fusion subunit can be configured to fuse the similarity and difference vector of the word vector and the part-of-speech information of the word pair of the Chinese word to be detected to form a combined vector.

Further, in this embodiment, the adjusting module may further include a determining unit; the judging unit can be configured to judge whether the first semantic relationship is consistent with the second semantic relationship, and if so, the first semantic relationship is not adjusted; and if the two semantic relations are inconsistent, taking the semantic relation with the second highest probability in the preset semantic relations corresponding to the combined vector as the first semantic relation of the word pair of the Chinese words to be detected.

Further, in this embodiment, the word structure features include a part-of-speech feature, a special character feature, a structure feature, and an individual character feature; the adjusting module also comprises a part of speech judging unit, a special character judging unit, a structure judging unit and a single character judging unit.

The part-of-speech judging unit may be configured to judge whether a word pair of the Chinese words to be detected satisfies a preset part-of-speech judging condition, and if so, obtain a second semantic relationship according to the preset part-of-speech judging condition.

The special character judging unit may be configured to judge whether the to-be-detected chinese word includes a preset special chinese character when the part of speech judging unit cannot obtain the second semantic relationship, delete the preset special chinese character if the to-be-detected chinese word includes the preset special chinese character, and input the modified to-be-detected chinese word pair to the structure judging unit; and if not, directly inputting the Chinese word pair to be detected to a structure judging unit.

The structure judging unit may be configured to judge whether the pair of chinese words to be detected satisfies a preset structure judging condition when the part of speech judging unit cannot obtain the second semantic relationship, and obtain the second semantic relationship according to the preset structure judging condition when the pair of chinese words to be detected satisfies the preset structure judging condition.

The single-word judging unit may be configured to judge whether the pair of chinese words to be detected satisfies a preset single-word judging condition when the structure judging unit cannot obtain the second semantic relationship, and obtain the second semantic relationship according to the preset single-word judging condition when the pair of chinese words to be detected satisfies the preset single-word judging condition.

The preset structure judgment conditions include: if the structural characteristics of two Chinese words in the Chinese word pair to be detected are ab and bc respectively, or a and ba respectively, the two Chinese words are in integral and partial relation; if the structural characteristics of the two Chinese words are a and ac respectively, the two Chinese words are in an upper-level and lower-level relation; a. b and c are each a single Chinese character different from each other.

The technical principle, the technical problems to be solved and the technical effects produced by the embodiment of the method for identifying a chinese semantic relationship are similar, and those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working process and the related description of the device for identifying a chinese semantic relationship described above can refer to the method for identifying a chinese semantic relationship, and are not repeated herein.

Those skilled in the art will appreciate that the above-described method for identifying chinese semantic relationships may also include other well-known structures such as processors, controllers, memories, etc., wherein the memories include, but are not limited to, ram, flash, rom, prom, eprom, nvm, non-volatile memory, serial, parallel, or registers, etc., and the processors include, but are not limited to, CPLD/FPGA, DSP, ARM processor, MIPS processor, etc., and these well-known structures are not shown in order to unnecessarily obscure the embodiments of the present disclosure.

Those skilled in the art will appreciate that the modules in the devices in the embodiments may be adaptively changed and arranged in one or more devices different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing embodiment of the method for identifying a semantic relationship of chinese, and are not described herein again.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in a server, client, or the like, according to embodiments of the present invention. The present invention may also be embodied as an apparatus or device program (e.g., PC program and PC program product) for carrying out a portion or all of the methods described herein. Such a program implementing the invention may be stored on a PC readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims of the present invention, any of the claimed embodiments may be used in any combination.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed PC. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A method for recognizing Chinese semantic relation is characterized by comprising the following steps:

2. The method according to claim 1, wherein the step of obtaining the first semantic relationship of the word vector of the word pair of the Chinese words to be detected specifically comprises:

3. The method according to claim 2, wherein the step of extracting the features of the word vector and constructing a combined vector according to the features of the word vector and the part-of-speech information of the word pair of the Chinese words to be detected specifically comprises:

and fusing the similarity and the difference vector of the word vector and the part-of-speech information of the word pair of the Chinese words to be detected to form a combined vector:

x＝(W，COS_S，P₁，P₂)

wherein W ═ W (a) -W (b) is the difference vector of the two word vectors, COS_SCosine similarity of the word vectors after the dimensionality reduction of the words A and B, P₁And P₂The encoded values for word a and word B.

4. The method according to claim 2, wherein the step of adjusting the first semantic relationship according to the second semantic relationship to obtain the final semantic relationship specifically comprises:

5. The method according to any one of claims 1 to 4, wherein the word structure features include part-of-speech features, special word features, structure features, and word features; the step of acquiring the second semantic relationship by using the word structure characteristics of the Chinese word pair to be detected specifically comprises the following steps:

when the Chinese word and phrase pair to be detected meets a preset single word judgment condition, obtaining a second semantic relation according to the preset single word judgment condition;

6. An apparatus for recognizing semantic relation of chinese, the apparatus comprising:

7. The apparatus of claim 6, wherein the obtaining module comprises:

8. The apparatus of claim 7, wherein the extraction building unit comprises:

9. The apparatus of claim 7, wherein the adjusting module comprises a determining unit; the judging unit is configured to judge whether the first semantic relationship is consistent with the second semantic relationship, and if so, not to adjust the first semantic relationship; and if the two semantic relations are inconsistent, taking the semantic relation with the second highest probability in the preset semantic relations corresponding to the combined vector as the first semantic relation of the word pair of the Chinese words to be detected.

10. The apparatus according to any one of claims 6-9, wherein the word structure features comprise part-of-speech features, special word features, structure features, and word features; the adjustment module further comprises:

the structure judging unit is configured to judge whether the Chinese word pair to be detected meets a preset structure judging condition, and when the Chinese word pair to be detected meets the preset structure judging condition, a second semantic relationship is obtained according to the preset structure judging condition;

the single word judging unit is configured to judge whether the Chinese word and word pair to be detected meets a preset single word judging condition when the structure judging unit cannot obtain the second semantic relationship, and obtain the second semantic relationship according to the preset single word judging condition when the Chinese word and word pair to be detected meets the preset single word judging condition;