CN109766551A

CN109766551A - Method and system for determining polysemous word meaning

Info

Publication number: CN109766551A
Application number: CN201910015288.2A
Authority: CN
Inventors: 魏誉荧
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2019-01-08
Filing date: 2019-01-08
Publication date: 2019-05-17
Anticipated expiration: 2039-01-08
Also published as: CN109766551B

Abstract

The invention provides a method and a system for determining polysemous word meaning, wherein the method comprises the following steps: obtaining a corpus sample, and generating a regular model according to the corpus sample; acquiring polysemous words and different polysemous word parts of words and polysemous word semantemes corresponding to the polysemous words, and establishing a polysemous word library according to the polysemous words, the polysemous word parts of words and the polysemous word semantemes; acquiring a user statement; segmenting words of user sentences through a word segmentation technology to obtain user segmented words; comparing the user word segmentation with the polysemous word library to obtain a user polysemous word, a plurality of corresponding user polysemous word parts of speech and a user polysemous word semantic; and matching the user sentences with the regular model by combining the part of speech of the user polysemous words and the semantics of the user polysemous words, and determining the part of speech of the target polysemous words and the semantics of the target polysemous words corresponding to the user polysemous words in the user sentences. According to the semantic meaning determining method and device, the regular model is generated through the corpus samples, so that the semantics of the polysemous words contained in the user sentences are determined according to the regular model, and the understanding ambiguity is avoided.

Description

A kind of determination method and system of polysemant semanteme

Technical field

The present invention relates to language analysis processing technology field, espespecially a kind of determination method and system of polysemant semanteme.

Background technique

With the fast development of internet, people's lives become more and more intelligent.User is identified by intelligent terminal Then it is also increasingly common, such as keyword search etc. that the information of input executes corresponding operation, therefore identifies that user passes through The accuracy for the information that intelligent terminal is inputted drastically influences feedback made by intelligent terminal.

But there are a large amount of polysemant in Chinese character, there are when polysemant in while statement, intelligent terminal possibly can not be accurate The semanteme for identifying the polysemant in sentence leads to not the true intention for identifying user.In addition, during translation, it is right The accuracy of cypher text is also directly affected in the selection of polysemant semanteme.Therefore, a kind of ambiguity word is needed currently on the market The determination method and system of justice.

Summary of the invention

The object of the present invention is to provide a kind of determination method and system of polysemant semanteme, realization is generated by corpus sample Canonical model avoids understanding ambiguity occur to determine the semanteme for the polysemant for including in user's sentence according to canonical model.

Technical solution provided by the invention is as follows:

The present invention provides a kind of determination method of polysemant semanteme, comprising:

Corpus sample is obtained, canonical model is generated according to the corpus sample；

It obtains polysemant and the corresponding different polysemant part of speech of the polysemant and polysemant is semantic, according to described more Adopted word, the polysemant part of speech and the polysemant semanteme establish ambiguity dictionary；

Obtain user's sentence；

User's sentence is segmented by participle technique to obtain user's participle；

User participle and the ambiguity dictionary are compared, user's polysemant and corresponding multiple users are obtained Polysemant part of speech and user's polysemant are semantic；

It is semantic in conjunction with user's polysemant part of speech and user's polysemant, by user's sentence and the modulus of regularity Type is matched, and determines user's polysemant corresponding target polysemant part of speech and target polysemant in user's sentence It is semantic.

Further, the acquisition corpus sample generates canonical model according to the corpus sample and specifically includes:

Obtain the corpus sample；

Syntax mark and part-of-speech tagging are carried out to the corpus sample；

The syntax mark and the part-of-speech tagging are analyzed, corpus main body and the institute's predicate in the corpus sample are obtained Expect the relevant body association word of main body and the main body part of speech that the corpus main body is accepted；

The canonical model is generated according to the corpus main body, the body association word and the main body part of speech.

Further, user's polysemant part of speech described in the combination and user's polysemant are semantic, by the user Sentence and the canonical model are matched, and determine user's polysemant corresponding target polysemant in user's sentence Part of speech and target polysemant semanteme specifically include:

It is semantic in conjunction with multiple user's polysemant parts of speech and user's polysemant, by user's sentence and it is described just Then model is matched, and obtains multiple matching results；

When the knot that only one user's polysemant part of speech in the matching result and corresponding user's polysemant semanteme obtain When fruit is consistent, then the user's polysemant part of speech being consistent and the corresponding user's polysemant semanteme of matching is target polysemant part of speech It is semantic with target polysemant.

Further, user's polysemant part of speech described in the combination and user's polysemant are semantic, by the user Sentence and the canonical model are matched, and determine user's polysemant corresponding target polysemant in user's sentence Part of speech and target polysemant are semantic further include:

When the result for thering are multiple user's polysemant parts of speech and corresponding user's polysemant semanteme to obtain in the matching result When being consistent, then one of user's polysemant part of speech is chosen according to matching degree and corresponding user's polysemant semanteme is target ambiguity Word part of speech and target polysemant are semantic, and the matching degree is the ratio that user's sentence is consistent with canonical model matching.

Further, the multiple user's polysemant parts of speech of the combination and user's polysemant are semantic, will be described User's sentence and the canonical model match, and obtain multiple matching results and specifically include:

It is semantic in conjunction with user's polysemant part of speech and user's polysemant, syntax is carried out to user's sentence and is torn open Point, obtain sentence main body, sentence body association word relevant to the sentence main body and the institute's predicate in user's sentence The sentence main body part of speech that sentence main body is accepted；

The corresponding sentence that will be obtained according to multiple user's polysemant parts of speech and user's polysemant semanteme Main body, the sentence body association word and the sentence main body part of speech are matched with the canonical model respectively, are obtained more A matching result.

The present invention also provides a kind of systems of the determination of polysemant semanteme, comprising:

Model generation module obtains corpus sample, generates canonical model according to the corpus sample；

Dictionary establishes module, obtains polysemant and the polysemant corresponding different polysemant part of speech and ambiguity word Justice establishes ambiguity dictionary according to the polysemant, the polysemant part of speech and the polysemant semanteme；

Module is obtained, user's sentence is obtained；

Word segmentation module is segmented to obtain user by user's sentence that participle technique obtains the acquisition module Participle；

Contrast module establishes user participle and the dictionary that the word segmentation module obtains described in module foundation Ambiguity dictionary compares, and obtains user's polysemant and corresponding multiple user's polysemant parts of speech and user's polysemant is semantic；

Processing module, the user's polysemant part of speech obtained in conjunction with the contrast module and user's ambiguity word The canonical model that user's sentence and the model generation module generate is matched, determines that the user is more by justice Adopted word corresponding target polysemant part of speech and target polysemant in user's sentence is semantic.

Further, the model generation module specifically includes:

Sample acquisition unit obtains the corpus sample；

Unit is marked, syntax mark and part-of-speech tagging are carried out to the corpus sample that the sample acquisition unit obtains；

Analytical unit analyzes syntax mark and the part-of-speech tagging that the mark unit obtains, obtains institute's predicate Expect the main body word that corpus main body, body association word relevant to the corpus main body and the corpus main body in sample are accepted Property；

Model generation unit, the corpus main body obtained according to the analytical unit, the body association word and institute It states main body part of speech and generates the canonical model.

Further, the processing module specifically includes:

Matching unit, the multiple user's polysemant parts of speech obtained in conjunction with the contrast module and user's polysemant The canonical model that user's sentence and the model generation module generate is matched, obtains multiple matchings by semanteme As a result；

Processing unit, when only one user's polysemant part of speech in the matching result that the matching unit obtains and right It is when the result that the user's polysemant semanteme answered obtains is consistent, then described to match the user's polysemant part of speech being consistent and corresponding user Polysemant semanteme is that target polysemant part of speech and target polysemant are semantic.

Further, the processing module further include:

The processing unit, when have in the matching result that the matching unit obtains multiple user's polysemant parts of speech and When the result that corresponding user's polysemant semanteme obtains is consistent, then according to matching degree choose one of user's polysemant part of speech and Corresponding user's polysemant semanteme is that target polysemant part of speech and target polysemant are semantic, and the matching degree is user's sentence The ratio being consistent is matched with the canonical model.

Further, the matching unit specifically includes:

Analyze subelement, the user's polysemant part of speech obtained in conjunction with the contrast module and user's ambiguity word Justice carries out syntax fractionation to user's sentence, obtains sentence main body in user's sentence, related to the sentence main body Sentence body association word and the sentence main body accept sentence main body part of speech；

Coupling subelement, multiple user's polysemant words that the analysis subelement is obtained according to the contrast module The corresponding sentence main body, the sentence body association word and the sentence that property and user's polysemant semanteme obtain The canonical model that main body part of speech is generated with the model generation module respectively is matched, and multiple matching results are obtained.

A kind of determination method and system of polysemant semanteme provided through the invention, can bring following at least one to have Beneficial effect:

1, in the present invention, syntax and part of speech generation canonical model are obtained by analyzing corpus sample, in conjunction with obtained canonical Model determines the semanteme for the polysemant for including in user's sentence.

2, it in the present invention, will be analyzed in multiple parts of speech of polysemant and semantic set access customer sentence respectively one by one, so It is matched afterwards with canonical model, so that it is determined that the corresponding semanteme of most suitable polysemant.

Detailed description of the invention

Below by clearly understandable mode, preferred embodiment is described with reference to the drawings, really to a kind of polysemant semanteme Above-mentioned characteristic, technical characteristic, advantage and its implementation for determining method and system are further described.

Fig. 1 is a kind of flow chart of one embodiment of the determination method of polysemant semanteme of the present invention；

Fig. 2 is a kind of flow chart of another embodiment of the determination method of polysemant semanteme of the present invention；

Fig. 3 is a kind of flow chart of another embodiment of the determination method of polysemant semanteme of the present invention；

Fig. 4 is a kind of flow chart of another embodiment of the determination method of polysemant semanteme of the present invention；

Fig. 5 is a kind of flow chart of another embodiment of the determination method of polysemant semanteme of the present invention；

Fig. 6 is a kind of flow chart of another embodiment of the determination method of polysemant semanteme of the present invention；

Fig. 7 is a kind of structural schematic diagram of one embodiment of the determination system of polysemant semanteme of the present invention；

Fig. 8 is a kind of structural schematic diagram of another embodiment of the determination system of polysemant semanteme of the present invention.

Drawing reference numeral explanation:

The determination system of 1000 polysemants semanteme

1100 models generate 1110 sample acquisition unit 1120 of system and mark 1130 analytical unit of unit

1140 model generation units

1200 dictionaries establish module 1300 and obtain 1400 word segmentation module of module

1500 contrast modules

1600 processing module, 1610 matching unit 1611 analyzes 1612 coupling subelement of subelement

1620 processing units

Specific embodiment

It, below will be to ordinarily in order to clearly illustrate the embodiment of the present invention or technical solution in the prior art Bright book Detailed description of the invention a specific embodiment of the invention.It should be evident that the accompanying drawings in the following description is only of the invention one A little embodiments for those of ordinary skill in the art without creative efforts, can also be according to these Attached drawing obtains other attached drawings, and obtains other embodiments.

In order to make simplified form, part related to the present invention is only schematically shown in each figure, their not generations Its practical structures as product of table.In addition, there is identical structure or function in some figures so that simplified form is easy to understand Component, only symbolically depict one of those, or only marked one of those.Herein, "one" not only table Show " only this ", can also indicate the situation of " more than one ".

One embodiment of the present of invention, as shown in Figure 1, a kind of determination method of polysemant semanteme, comprising:

S100 obtains corpus sample, generates canonical model according to the corpus sample.

Specifically, obtaining corpus sample, the syntactic structure and word part of speech of the corpus sample are analyzed, thus according to corpus Sample generates corresponding canonical model.

S200 obtains polysemant and the corresponding different polysemant part of speech of the polysemant and polysemant is semantic, according to institute It states polysemant, the polysemant part of speech and the polysemant semanteme and establishes ambiguity dictionary.

Specifically, polysemant and the corresponding polysemant part of speech of the polysemant and polysemant semanteme are obtained, due to polysemant With multiple semantemes, corresponded to each other between polysemant part of speech and polysemant semanteme, therefore polysemant corresponds to multiple groups polysemant part of speech With polysemant semanteme.Ambiguity dictionary is established according to polysemant, polysemant part of speech and polysemant semanteme, is established in the ambiguity dictionary Between corresponding relationship and polysemant between polysemant part of speech and polysemant semanteme and polysemant part of speech, polysemant semanteme Corresponding relationship.

S300 obtains user's sentence.

S400 segments user's sentence by participle technique to obtain user's participle.

Specifically, obtaining user's sentence, segmented by participle technique to user's sentence is obtained, thus by user's language Sentence splits into the users such as word, word participle.

S500 compares user participle and the ambiguity dictionary, obtains user's polysemant and corresponding multiple User's polysemant part of speech and user's polysemant are semantic.

Specifically, one by one comparing the polysemant included in user's participle and ambiguity dictionary, it is consistent if comparison has , then user participle is determined as user's polysemant, determines that user's polysemant is corresponding according to the corresponding relationship in ambiguity dictionary Multiple user's polysemant parts of speech and user polysemant it is semantic.

S600 is semantic in conjunction with user's polysemant part of speech and user's polysemant, by user's sentence and it is described just Then model is matched, and determines that user's polysemant corresponding target polysemant part of speech and target in user's sentence are more Adopted word justice.

Specifically, in conjunction with user's polysemant part of speech and user's polysemant semantic analysis user's sentence, and with canonical model into Row matching determines that user's polysemant part of speech that matching result is consistent and user's polysemant semanteme are user's polysemants in user's sentence In target polysemant part of speech and target polysemant it is semantic.

In the present embodiment, syntax and part of speech generation canonical model are obtained by analyzing corpus sample, one by one respectively by user Then and modulus of regularity it is analyzed in multiple user's polysemant parts of speech and user's polysemant semanteme the set access customer sentence of polysemant, Type is matched, so that it is determined that the corresponding user's polysemant of most suitable user's polysemant is semantic.

Another embodiment of the invention is the optimal enforcement example of the above embodiments, as shown in Figure 2, comprising:

The S100 obtains corpus sample, generates canonical model according to the corpus sample and specifically includes:

S110 obtains the corpus sample.

S120 carries out syntax mark and part-of-speech tagging to the corpus sample.

Specifically, obtaining a large amount of corpus sample, corpus sample is segmented by participle technique, and to corpus sample This sentence structure is analyzed to obtain the mutual connection relationship of word, word, sentence, to carry out syntax mark to corpus sample And part-of-speech tagging.

S130 analyzes the syntax mark and the part-of-speech tagging, obtains the corpus main body and institute in the corpus sample The main body part of speech that the relevant body association word of predicate material main body and the corpus main body are accepted.

S140 generates the canonical model according to the corpus main body, the body association word and the main body part of speech.

Specifically, analysis syntax mark and the part-of-speech tagging, obtain crucial word, the word, sentence conduct in corpus sample Corpus main body, and analyze and determine body association word relevant to corpus main body and the main body part of speech that corpus main body is accepted.According to Determining corpus main body, body association word and main body part of speech generates corresponding canonical model.

S300 obtains user's sentence.

It in the present embodiment, obtains a large amount of corpus sample and the syntax and part of speech of corpus sample is analyzed, thus raw At corresponding canonical model, consequently facilitating the subsequent semanteme for rapidly and accurately determining the polysemant for including in user's sentence.

Another embodiment of the invention is the optimal enforcement example of the above embodiments, as shown in Figure 3, comprising:

S300 obtains user's sentence.

The S600 is semantic in conjunction with user's polysemant part of speech and user's polysemant, by user's sentence and The canonical model is matched, determine user's polysemant in user's sentence corresponding target polysemant part of speech and Target polysemant semanteme specifically includes:

S610 is semantic in conjunction with multiple user's polysemant parts of speech and user's polysemant, by user's sentence and institute It states canonical model to be matched, obtains multiple matching results.

Specifically, semantic in conjunction with multiple user's polysemant parts of speech and user polysemant, by user's sentence and canonical model into Row matching, regards each corresponding user's polysemant semanteme and user's polysemant part of speech as one group, each group of user's ambiguity word Justice and user's polysemant part of speech combination user sentence and canonical model match to obtain a matching result.Therefore user's polysemant pair Multiple groups user's polysemant semanteme and user's polysemant part of speech are answered, to obtain multiple matching results.

S620 works as only one user's polysemant part of speech and corresponding user's polysemant semanteme in the matching result and obtains Result when being consistent, then it is described to match the user's polysemant part of speech being consistent and corresponding user's polysemant semanteme is target polysemant Part of speech and target polysemant are semantic.

Specifically, if only one in multiple matching results is consistent the result is that matching, pair that matching result is consistent The user's polysemant part of speech and user's polysemant semanteme answered are that target polysemant part of speech and target polysemant are semantic, i.e. user's sentence In include user's polysemant part of speech and semanteme.

In the present embodiment, one by one respectively by multiple user's polysemant parts of speech of user's polysemant and user's polysemant semanteme set It is analyzed in access customer sentence, is then matched with canonical model, so that it is determined that most suitable user's polysemant is corresponding User's polysemant is semantic.

Another embodiment of the invention is the optimal enforcement example of the above embodiments, as shown in Figure 4, comprising:

S300 obtains user's sentence.

S630, which works as in the matching result, has multiple user's polysemant parts of speech and corresponding user's polysemant semanteme to obtain When being as a result consistent, then one of user's polysemant part of speech is chosen according to matching degree and corresponding user's polysemant semanteme is target Polysemant part of speech and target polysemant are semantic, and the matching degree is the ratio that user's sentence is consistent with canonical model matching Example.

Specifically, determining target ambiguity according to matching degree if multiple in multiple matching results be consistent the result is that matching Word part of speech and target polysemant are semantic, and matching degree is the ratio that user's sentence is consistent with canonical model matching, such as multiple matchings As a result a matching result is 100% matching in, another matching result is 50% matching, then selects matching result for 100% Matched corresponding user's polysemant part of speech and user's polysemant semanteme are that target polysemant part of speech and target polysemant are semantic.

Another embodiment of the invention is the optimal enforcement example of the above embodiments, as shown in Figure 5, comprising:

S300 obtains user's sentence.

The S610 is semantic in conjunction with multiple user's polysemant parts of speech and user's polysemant, by user's language Sentence and the canonical model are matched, and are obtained multiple matching results and are specifically included:

S611 is semantic in conjunction with user's polysemant part of speech and user's polysemant, carries out syntax to user's sentence It splits, obtains sentence main body in user's sentence, sentence body association word relevant to the sentence main body and described The sentence main body part of speech that sentence main body is accepted.

Specifically, it is semantic in conjunction with multiple user's polysemant parts of speech and user's polysemant, by each corresponding user's polysemant Semantic and user's polysemant part of speech regards one group as, is seriatim inserted in each group of user's polysemant semanteme and user's polysemant part of speech Then user's sentence carries out syntax fractionation to user's sentence, determine sentence main body in user's sentence, relevant to sentence main body The sentence main body part of speech that sentence body association word and sentence main body are accepted.

S612 will obtain corresponding described according to multiple user's polysemant parts of speech and user's polysemant semanteme Sentence main body, the sentence body association word and the sentence main body part of speech are matched with the canonical model respectively, are obtained To multiple matching results.

Specifically, sentence main body, sentence body association word and sentence main body part of speech and canonical model are matched, by Multiple groups user's polysemant semanteme and user's polysemant part of speech are corresponded in user's polysemant, therefore obtains multiple matching results.

In the present embodiment, each user's polysemant semanteme of user's polysemant is seriatim covered into access customer sentence, to user Sentence is analyzed, then is matched with canonical model, so that it is determined that the semanteme for the user's polysemant for including in user's sentence.

Another embodiment of the invention is the optimal enforcement example of the above embodiments, as shown in Figure 6, comprising:

S300 obtains user's sentence.

The concrete operations mode of each step in the present embodiment has been carried out in above-mentioned corresponding embodiment of the method Detailed description, therefore no longer repeated one by one.

One embodiment of the present of invention, as shown in fig. 7, a kind of determination system 1000 of polysemant semanteme, comprising:

Model generation module 1100 obtains corpus sample, generates canonical model according to the corpus sample.

Specifically, model generation module 1100 obtains corpus sample, the syntactic structure and word of the corpus sample are analyzed Part of speech, to generate corresponding canonical model according to corpus sample.

Dictionary establishes module 1200, obtains polysemant and the polysemant corresponding different polysemant part of speech and ambiguity Word justice, establishes ambiguity dictionary according to the polysemant, the polysemant part of speech and the polysemant semanteme.

Specifically, dictionary, which establishes module 1200, obtains polysemant and the corresponding polysemant part of speech of the polysemant and polysemant Semanteme corresponds to each other, therefore polysemant is corresponding since polysemant has multiple semantemes between polysemant part of speech and polysemant semanteme Multiple groups polysemant part of speech and polysemant are semantic.It is semantic according to polysemant, polysemant part of speech and polysemant that dictionary establishes module 1200 Ambiguity dictionary is established, in the corresponding relationship and ambiguity established between polysemant part of speech and polysemant semanteme in the ambiguity dictionary Corresponding relationship between word and polysemant part of speech, polysemant semanteme.

Module 1300 is obtained, user's sentence is obtained.

Word segmentation module 1400 is segmented by user's sentence that participle technique obtains the acquisition module 1300 Obtain user's participle.

Specifically, obtaining module 1300 obtains user's sentence, word segmentation module 1400 is by participle technique to acquisition user's language Sentence is segmented, so that user's sentence is split into the users such as word, word participle.

Contrast module 1500 carries out the user that the word segmentation module 1400 obtains participle and the ambiguity dictionary pair Than obtaining user's polysemant and corresponding multiple user's polysemant parts of speech and user's polysemant being semantic.

Specifically, contrast module 1500 one by one compares the polysemant included in user's participle and ambiguity dictionary, such as Fruit comparison, which has, to be consistent, then user participle is determined as user's polysemant, according to the corresponding relationship in ambiguity dictionary determines the use The corresponding multiple user's polysemant parts of speech of family polysemant and user's polysemant are semantic.

Processing module 1600, the user's polysemant part of speech and the user obtained in conjunction with the contrast module 1500 are more Adopted word justice, user's sentence and the canonical model are matched, determine user's polysemant in user's language Corresponding target polysemant part of speech and target polysemant are semantic in sentence.

Specifically, processing module 1600 combines user's polysemant part of speech and user's polysemant semantic analysis user's sentence, and It is matched with canonical model, determines that user's polysemant part of speech that matching result is consistent and user's polysemant semanteme are user's ambiguity Target polysemant part of speech and target polysemant of the word in user's sentence are semantic.

Another embodiment of the invention is the optimal enforcement example of the above embodiments, as shown in Figure 8, comprising:

The model generation module 1100 specifically includes:

Sample acquisition unit 1110 obtains the corpus sample.

Unit 1120 is marked, syntax mark and word are carried out to the corpus sample that the sample acquisition unit 1110 obtains Property mark.

Specifically, sample acquisition unit 1110 obtains a large amount of corpus sample, mark unit 1120 passes through participle technique pair Corpus sample is segmented, and is analyzed the sentence structure of corpus sample to obtain the mutual connection pass of word, word, sentence System, to carry out syntax mark and part-of-speech tagging to corpus sample.

Analytical unit 1130 is analyzed syntax mark and the part-of-speech tagging that the mark unit 1120 obtains, is obtained It is accepted to corpus main body, body association word relevant to the corpus main body and the corpus main body in the corpus sample Main body part of speech.

Model generation unit 1140, the corpus main body obtained according to the analytical unit 1130, the body association Word and the main body part of speech generate the canonical model.

Specifically, analytical unit 1130 analyzes syntax mark and the part-of-speech tagging, the key in corpus sample is obtained Word, word, sentence are analyzed as corpus main body and determine what body association word relevant to corpus main body and corpus main body were accepted Main body part of speech.Model generation unit 1140 generates corresponding according to determining corpus main body, body association word and main body part of speech Canonical model.

Module 1300 is obtained, user's sentence is obtained.

The user participle that the word segmentation module 1400 obtains is established module with the dictionary by contrast module 1500 The 1200 ambiguity dictionaries established compare, and obtain user's polysemant and corresponding multiple user's polysemant parts of speech and use Family polysemant is semantic.

Processing module 1600, the user's polysemant part of speech and the user obtained in conjunction with the contrast module 1500 are more Adopted word justice, the canonical model that user's sentence and the model generation module 1100 generate is matched, is determined User's polysemant corresponding target polysemant part of speech and target polysemant in user's sentence is semantic.

The processing module 1600 specifically includes:

Matching unit 1610, the multiple user's polysemant parts of speech and the use obtained in conjunction with the contrast module 1500 Family polysemant is semantic, and the canonical model that user's sentence and the model generation module 1100 generate is matched, Obtain multiple matching results.

Specifically, matching unit 1610 combines multiple user's polysemant parts of speech and user's polysemant semantic, by user's sentence It is matched with canonical model, regards each corresponding user's polysemant semanteme and user's polysemant part of speech as one group, each group User's polysemant semanteme and user's polysemant part of speech combination user sentence and canonical model match to obtain a matching result.Therefore User's polysemant corresponds to multiple groups user's polysemant semanteme and user's polysemant part of speech, to obtain multiple matching results.

Processing unit 1620, when only one user's polysemant in the matching result that the matching unit 1610 obtains When part of speech is consistent with the result that corresponding user's polysemant semanteme obtains, then the user's polysemant part of speech that is consistent and right of matching The user's polysemant semanteme answered is that target polysemant part of speech and target polysemant are semantic.

Specifically, processing unit 1620 determines if only one in multiple matching results is consistent the result is that matching The corresponding user's polysemant part of speech and user's polysemant semanteme that matching result is consistent are target polysemant part of speech and target ambiguity Word justice, i.e., the part of speech and semanteme of the user's polysemant for including in user's sentence.

The processing module 1600 further include:

The processing unit 1620, when there is multiple user's ambiguity in the matching result that the matching unit 1610 obtains When word part of speech is consistent with the result that corresponding user's polysemant semanteme obtains, then one of user's ambiguity is chosen according to matching degree Word part of speech and corresponding user's polysemant semanteme are that target polysemant part of speech and target polysemant are semantic, and the matching degree is described User's sentence matches the ratio being consistent with the canonical model.

Specifically, processing unit 1620 is according to matching degree if multiple in multiple matching results the result is that matching is consistent Determining that target polysemant part of speech and target polysemant are semantic, matching degree is the ratio that user's sentence is consistent with canonical model matching, Such as a matching result is 100% matching in multiple matching results, another matching result is 50% matching, then selects to match It as a result is that target polysemant part of speech and target are more for 100% matched corresponding user's polysemant part of speech and user's polysemant semanteme Adopted word justice.

The matching unit 1610 specifically includes:

Analyze subelement 1611, the user's polysemant part of speech obtained in conjunction with the contrast module 1500 and the user Polysemant is semantic, carries out syntax fractionation to user's sentence, obtains sentence main body and the sentence in user's sentence The sentence main body part of speech that the relevant sentence body association word of main body and the sentence main body are accepted.

Specifically, analysis subelement 1611 combines multiple user's polysemant parts of speech and user's polysemant semantic, by each phase Corresponding user's polysemant semanteme and user's polysemant part of speech regard one group as, seriatim by each group of user's polysemant semanteme and user Polysemant part of speech covers access customer sentence, then carries out syntax fractionation to user's sentence, determine sentence main body in user's sentence, with The sentence main body part of speech that the relevant sentence body association word of sentence main body and sentence main body are accepted.

Coupling subelement 1612 obtains according to the contrast module 1500 the analysis subelement 1611 multiple described The corresponding sentence main body that user's polysemant part of speech and user's polysemant semanteme obtain, the sentence body association word And the canonical model that the sentence main body part of speech is generated with the model generation module 1100 respectively is matched, and is obtained Multiple matching results.

In the present embodiment, obtains a large amount of corpus sample and the syntax and part of speech of corpus sample are analyzed, seriatim Each user's polysemant semanteme of user's polysemant is covered into access customer sentence, user's sentence is analyzed, then and canonical model It is matched, consequently facilitating the subsequent semanteme for rapidly and accurately determining the polysemant for including in user's sentence.

It should be noted that above-described embodiment can be freely combined as needed.The above is only of the invention preferred Embodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the invention Under, several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.

Claims

1. a kind of determination method of polysemant semanteme characterized by comprising

It obtains polysemant and the corresponding different polysemant part of speech of the polysemant and polysemant is semantic, according to the ambiguity Word, the polysemant part of speech and the polysemant semanteme establish ambiguity dictionary；

Obtain user's sentence；

User participle and the ambiguity dictionary are compared, user's polysemant and corresponding multiple user's ambiguity are obtained Word part of speech and user's polysemant are semantic；

It is semantic in conjunction with user's polysemant part of speech and user's polysemant, by user's sentence and the canonical model into Row matching, determines user's polysemant corresponding target polysemant part of speech and target ambiguity word in user's sentence Justice.

2. the determination method of polysemant semanteme according to claim 1, which is characterized in that the acquisition corpus sample, Canonical model is generated according to the corpus sample to specifically include:

Obtain the corpus sample；

Syntax mark and part-of-speech tagging are carried out to the corpus sample；

The syntax mark and the part-of-speech tagging are analyzed, corpus main body and the corpus master in the corpus sample are obtained The main body part of speech that the relevant body association word of body and the corpus main body are accepted；

3. the determination method of polysemant semanteme according to claim 1, which is characterized in that user described in the combination is more Adopted word part of speech and user's polysemant are semantic, and user's sentence and the canonical model are matched, determine the use Family polysemant corresponding target polysemant part of speech and target polysemant semanteme in user's sentence specifically include:

It is semantic in conjunction with multiple user's polysemant parts of speech and user's polysemant, by user's sentence and the modulus of regularity Type is matched, and multiple matching results are obtained；

When the result phase that only one user's polysemant part of speech in the matching result and corresponding user's polysemant semanteme obtain Fu Shi, then the user's polysemant part of speech being consistent and the corresponding user's polysemant semanteme of matching is target polysemant part of speech and mesh It is semantic to mark polysemant.

4. the determination method of polysemant semanteme according to claim 3, which is characterized in that user described in the combination is more Adopted word part of speech and user's polysemant are semantic, and user's sentence and the canonical model are matched, determine the use Family polysemant corresponding target polysemant part of speech and target polysemant in user's sentence is semantic further include:

When the result for having multiple user's polysemant parts of speech to obtain in the matching result with corresponding user's polysemant semanteme is consistent When, then one of user's polysemant part of speech is chosen according to matching degree and corresponding user's polysemant semanteme is target polysemant word Property and target polysemant it is semantic, the matching degree is that user's sentence with the canonical model matches the ratio being consistent.

5. the determination method of polysemant semanteme according to claim 3 or 4, which is characterized in that the multiple institutes of the combination It states user's polysemant part of speech and user's polysemant is semantic, user's sentence and the canonical model are matched, obtained It is specifically included to multiple matching results:

It is semantic in conjunction with user's polysemant part of speech and user's polysemant, syntax fractionation is carried out to user's sentence, is obtained To sentence main body, sentence body association word relevant to the sentence main body and the sentence main body in user's sentence The sentence main body part of speech of undertaking；

By the corresponding sentence main body obtained according to multiple user's polysemant parts of speech and user's polysemant semanteme, The sentence body association word and the sentence main body part of speech are matched with the canonical model respectively, obtain multiple matchings As a result.

6. a kind of determination system of polysemant semanteme characterized by comprising

Dictionary establishes module, obtains polysemant and the corresponding different polysemant part of speech of the polysemant and polysemant is semantic, Ambiguity dictionary is established according to the polysemant, the polysemant part of speech and the polysemant semanteme；

Module is obtained, user's sentence is obtained；

Word segmentation module is segmented to obtain user point by user's sentence that participle technique obtains the acquisition module Word；

User participle and the dictionary that the word segmentation module obtains are established the ambiguity that module is established by contrast module Dictionary compares, and obtains user's polysemant and corresponding multiple user's polysemant parts of speech and user's polysemant is semantic；

Processing module, the user's polysemant part of speech obtained in conjunction with the contrast module and user's polysemant are semantic, will The canonical model that user's sentence and the model generation module generate matches, and determines that user's polysemant exists Corresponding target polysemant part of speech and target polysemant are semantic in user's sentence.

7. the determination system of polysemant semanteme according to claim 6, which is characterized in that the model generation module is specific Include:

Sample acquisition unit obtains the corpus sample；

Analytical unit analyzes syntax mark and the part-of-speech tagging that the mark unit obtains, obtains the corpus sample The main body part of speech that corpus main body, body association word relevant to the corpus main body and the corpus main body in this are accepted；

Model generation unit, the corpus main body obtained according to the analytical unit, the body association word and the master Pronouns, general term for nouns, numerals and measure words generates the canonical model.

8. the determination system of polysemant semanteme according to claim 6, which is characterized in that the processing module is specifically wrapped It includes:

Matching unit, the multiple user's polysemant parts of speech obtained in conjunction with the contrast module and user's ambiguity word Justice matches the canonical model that user's sentence and the model generation module generate, and obtains multiple matching knots Fruit；

Processing unit, when only one user's polysemant part of speech in the matching result that the matching unit obtains and corresponding It is when the result that user's polysemant semanteme obtains is consistent, then described to match the user's polysemant part of speech and corresponding user's ambiguity being consistent Word justice is that target polysemant part of speech and target polysemant are semantic.

9. the determination system of polysemant semanteme according to claim 8, which is characterized in that the processing module further include:

The processing unit, when having multiple user's polysemant parts of speech and correspondence in the matching result that the matching unit obtains User's polysemant semanteme obtained result when being consistent, then one of user's polysemant part of speech and correspondence are chosen according to matching degree User's polysemant semanteme be that target polysemant part of speech and target polysemant are semantic, the matching degree is user's sentence and institute It states canonical model and matches the ratio being consistent.

10. the determination system of polysemant semanteme according to claim 8 or claim 9, which is characterized in that the matching unit is specific Include:

Subelement is analyzed, the user's polysemant part of speech obtained in conjunction with the contrast module and user's polysemant are semantic, Syntax fractionation is carried out to user's sentence, obtains sentence main body in user's sentence, relevant to the sentence main body The sentence main body part of speech that sentence body association word and the sentence main body are accepted；

Coupling subelement, multiple user's polysemant parts of speech that the analysis subelement is obtained according to the contrast module and The corresponding sentence main body, the sentence body association word and the sentence main body that user's polysemant semanteme obtains The canonical model that part of speech is generated with the model generation module respectively is matched, and multiple matching results are obtained.