CN107193800A

CN107193800A - A kind of semantic goodness of fit evaluating method and device towards third party's language text

Info

Publication number: CN107193800A
Application number: CN201710353875.3A
Authority: CN
Inventors: 傅朝阳
Original assignee: Suzhou Dark Clouds Mdt Infotech Ltd
Current assignee: Shanghai Heiyun Technology Co ltd; Suzhou Black Cloud Intelligent Technology Co ltd
Priority date: 2017-05-18
Filing date: 2017-05-18
Publication date: 2017-09-22
Anticipated expiration: 2037-05-18
Also published as: CN107193800B

Abstract

Present invention is disclosed a kind of semantic goodness of fit evaluating method and device towards third party's language text, methods described includes：S1, sets up source background semantic model example；S2, sets up the expansion background semantic model example corresponding with source background semantic model；S3, sets up target background semantic model example and target linguistic context structural model example；S4, calculate target background semantic model example and target linguistic context structural model example arrives the semantic distance for expanding background semantic model example respectively, whether the original language semanteme with being provided matches according to the semanteme of the semantic distance evaluation and test third party's language target text calculated.The semantic matches to the text of third party's language performance and lookup based on original language (such as Chinese) can be achieved in the present invention, expands semantic matches and the scope searched, the sources of knowledge have been expanded significantly.

Description

A kind of semantic goodness of fit evaluating method and device towards third party's language text

Technical field

The present invention relates to natural language processing, text semantic analysis technical field, more particularly, to one kind towards third party The semantic goodness of fit evaluating method and device of language text.

Background technology

In routine work and life, such demand is often run into：In the case where being ignorant of third country's language, think It is state how to understand same problem in the country of third party's language by network.At present, this problem can't be obtained To effective solution, essentially consist in, still lack effective from Chinese to the semantic matches scheme of third country's language.

The content of the invention

It is an object of the invention to overcome the defect of prior art there is provided one kind using Chinese as original language, towards third party The semantic goodness of fit evaluating method and device of language, based on the program, allow users to, based on Chinese, accurately find desired Third party's language description similar or similar relevant issues or people, thing, thing.

To achieve the above object, the present invention proposes following technical scheme：A kind of semanteme kiss towards third party's language text Right evaluating method, including：

S1, sets up source background semantic model example；

S2, sets up the expansion background semantic model example corresponding with source background semantic model；

S3, sets up target background semantic model example and linguistic context structural model example；

S4, calculates the target background semantic model example and target linguistic context structural model example to the expansion background language The semantic distance of adopted model instance, according to the semantic distance that calculates evaluate and test third party's language target text it is semantic whether with institute The original language semanteme of offer matches.

Preferably, the source background semantic model includes the application field and at least one first model attributes of model, often Individual first model attributes are with two tuple (KEYWORD_i,WEIGHT_i) represent, wherein, KEYWORD_iFor original language keyword, WEIGHT_iFor KEYWORD_iThe weight occupied in whole source background semantic model, (1,2 ... n), i, and n is to be more than by i= Natural number equal to 1；The source background semantic model example procedure of setting up includes：By the application field of the model, original language Keyword and Weights Example turn to specific string value.

Preferably, setting up the expansion background semantic model example procedure includes setting up the first Extended Model example and second Extended Model example, wherein, first Extended Model is Extended Model of the original language through intermediate language to object language, described Second Extended Model is Extended Model of the original language to object language.

Preferably, first Extended Model and the second Extended Model structure are identical, are arranged including the first foreign languages translation word Table, and at least one second model attributes, the first foreign languages translation word list and the application field phase of source background semantic model Correspondence, each second model attributes triple (KEYWORDLIST_i,WEIGHT_i,α_i) represent, wherein, KEYWORDLIST_iFor with KEYWORD_iThe second corresponding foreign languages translation word list, WEIGHT_iFor KEYWORD_iIn the whole source back of the body The weight occupied in scape semantic model, α_iFor original language keyword to the similarity threshold between foreign languages translation word, i=(1, 2 ... n), i, and n is the natural number more than or equal to 1；The expansion background semantic model example procedure of setting up includes：Will be described The first foreign languages translation word list, the second foreign languages translation word list, weight and the semantic distance examples of threshold of model are turned to specifically String value.

Preferably, the second foreign languages translation word list KEYWORDLIST_iInclude at least one cellular construction, each The cellular construction is with two tuple (KEYWORD_i,Property_i) represent, wherein, KEYWORD_iFor with the background semantic model of source The corresponding intermediate language keyword or third party's language keywords of original language keyword, Property_iFor KEYWORD_iWord Property, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.

Preferably, the target background semantic model includes the target text notional word list arranged by word frequency descending, and At least one the 3rd model attributes, each 3rd model attributes triple (NotationalWord_i,Property_i, WFrequency_i) represent, wherein, NotationalWord_iFor target text notional word, Property_iFor NotationalWord_iPart of speech, WFrequency_iThe word frequency for being NotationalWordi in target text, i=(1, 2 ... n), i, and n is the natural number more than or equal to 1；The target background semantic model example procedure of setting up includes：Will be described The target text notional word list of model, target text notional word NotationalWord_i, part of speech Property_iAnd word frequency WFrequency_iIt is instantiated as specific string value.

Preferably, the target linguistic context structural model includes the target text notional word list arranged by word frequency descending, and At least one the 4th model attributes, each 4th model attributes NotationalWordArrayList_iRepresent, wherein, NotationalWordArrayList_iThe notional word structural array of a complete clause in expression target text, i=(1, 2 ... n), i, and n is the natural number more than or equal to 1；The target linguistic context structural model example procedure of setting up includes：Will be described The target text notional word list of model, notional word structural array NotationalWordArrayList_iIt is instantiated as third party Specific string value in language.

Preferably, the notional word structural array NotationalWordArrayList_iIncluding multiple cellular constructions, often Individual cellular construction triple (NotationalWord_i,Property_i,WFrequency_i) represent, wherein, NotationalWord_iFor target text notional word, Property_iFor NotationalWord_iPart of speech, WFrequency_iFor Word frequency of the NotationalWordi in target text, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.

Preferably, the S4 includes：

S401, obtains target background semantic model example and the first Extended Model example, and target background semantic model The common factor of example and the second Extended Model example, if two common factors are not sky, into S402；

S402, calculates target background semantic model example and the first Extended Model example, and target background semantic model The background semantic correlation of example and the second Extended Model example；

S403, calculates target linguistic context structural model example and the first Extended Model example, and target linguistic context structural model The sentence structure semantic dependency of example and the second Extended Model example；

S404, calculates target background semantic model example/target linguistic context structural model example and the first Extended Model example, And target background semantic model example/target linguistic context structural model example is related to the Technique Using Both Text of the second Extended Model example Property；Then according to the Technique Using Both Text correlation that calculates judge third party's language target text whether with current source background semantic Model instance semanteme coincide.

Preferably, in step S401, intersected using threshold similarity and compare function TSCC (δ, α, A, B), with reference to intersection language Adopted extremal function cSMVF (A, B, α) is calculated；

In step S402, calculated using the intersection semanteme extremal function cSMVF (A, B, α)；

In step S403, calculated using linguistic context structural model semantic valuation function cMSSF (A, B, α)；

In step S404, function SSRF (A, B, C, D) is evaluated and tested using Technique Using Both Text correlation and calculated；

Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable ginseng Number.

Present invention also offers another technical scheme：A kind of semantic goodness of fit towards third party's language text is evaluated and tested Device, including：

First model instance builds module, for setting up source background semantic model example, based on the mode of automatic guide, draws Lead the semantic feature that user creates the Chinese knowledge for wanting matching；

Second model instance builds module, for setting up the expansion background semantic model corresponding with source background semantic model Example；

3rd model instance builds module, for setting up target background semantic model example and linguistic context structural model example；

Semantic goodness of fit evaluation and test module, for calculating target background semantic model example and target linguistic context structural model example The semantic distance for expanding background semantic model example is arrived respectively, and third party's language target is evaluated and tested according to the semantic distance calculated Whether the semanteme of text matches with the original language semanteme provided.

Preferably, second model instance, which builds module, includes the first Extended Model example structure module and the second expansion Model instance builds module, and the first Extended Model example, which builds module, to be used to build the first Extended Model, and described first opens up Exhibition model is Extended Model example of the original language through intermediate language to object language；The second Extended Model example builds module For building the second Extended Model, second Extended Model is Extended Model example of the original language to object language.

Preferably, the semantic goodness of fit evaluation and test module includes：

Common factor computing module, for obtaining target background semantic model example and the first Extended Model example, and target The common factor of background semantic model example and the second Extended Model example；

Background semantic correlation calculations module, it is real for obtaining target background semantic model example and the first Extended Model Example, and target background semantic model example and the second Extended Model example background semantic correlation；

Sentence structure semantic dependency module, it is real for obtaining target linguistic context structural model example and the first Extended Model Example, and target linguistic context structural model example and the second Extended Model example sentence structure semantic dependency；

Technique Using Both Text correlation evaluation and test module, for calculating target background semantic model example/target linguistic context structural model Example and the first Extended Model example, and target background semantic model example/target linguistic context structural model example are opened up with second Open up the Technique Using Both Text correlation of model instance；Then third party's language target text is judged according to the Technique Using Both Text correlation calculated Whether this coincide with current source background semantic model example semantic.

Preferably, the common factor computing module, is intersected using threshold similarity and compares function TSCC (δ, α, A, B), with reference to Intersect semanteme extremal function cSMVF (A, B, α) to be calculated；

The background semantic correlation calculations module, is counted using the intersection semanteme extremal function cSMVF (A, B, α) Calculate；

The sentence structure semantic dependency computing module, using linguistic context structural model semantic valuation function cMSSF (A, B, α) calculated；

The Technique Using Both Text correlation evaluation and test module, evaluates and tests function SSRF (A, B, C, D) using Technique Using Both Text correlation and enters Row is calculated；

Compared with prior art, the present invention can be achieved to carry out the text of third party's language performance with original language (such as Chinese) Semantic matches and lookup, expand semantic matches and the scope searched, the sources of knowledge have been expanded significantly.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of semantic goodness of fit evaluating method of the invention；

Fig. 2 is the structural representation of source background semantic model of the present invention；

Fig. 3 is the structural representation that the present invention expands background semantic model；

Fig. 4 is the structural representation of the second foreign languages translation word list of the invention；

Fig. 5 is the structural representation of target background semantic model of the present invention；

Fig. 6 is the structural representation of target linguistic context structural model of the present invention；

Fig. 7 is the structural representation of notional word structural array of the present invention；

Fig. 8 is step S4 of the present invention schematic flow sheet.

Embodiment

Below in conjunction with the accompanying drawing of the present invention, the technical scheme to the embodiment of the present invention carries out clear, complete description.

A kind of semantic goodness of fit evaluating method and device towards third party's language text disclosed in the embodiment of the present invention, Realize using Chinese as original language, semantic matches and lookup are carried out towards third party's language text, be certainly not limited to during original language is Text or other languages.

As shown in figure 1, a kind of semantic goodness of fit towards third party's language text disclosed in the embodiment of the present invention is evaluated and tested Method, comprises the following steps：

S1, sets up Chinese background semantic model example.

Specifically, in the embodiment of the present invention, Chinese background semantic model is defined as structure as shown in Figure 2, in a tabular form Represent, include in the form：First generic and at least one first model attributes, the first generic are represented with variable X XX, are represented Application field belonging to whole Chinese background semantic model；Each first model attributes are with two tuple (KEYWORD_i,WEIGHT_i) Represent, wherein, KEYWORD_iFor Chinese key, WEIGHT_iFor KEYWORD_iShared by whole Chinese background semantic model Weight.

The process for setting up Chinese background semantic model example is by the generic XXX in model, original language keyword KEYWORD_i, weight WEIGHT_iIt is instantiated as specific string value.

S2, sets up foreign language corresponding with Chinese background semantic model and expands background semantic model example.

Specifically, setting up foreign language expansion background semantic model example includes two processes, that is, sets up the first Extended Model real Example and the second Extended Model example, the first Extended Model example are established as original language and arrive third party's language through intermediate language (such as English) The Extended Model example (en2rdEMI) of speech, another is exactly direct Chinese of setting up to the Extended Model example of third party's language (rdEMI)。

In embodiments of the present invention, the first Extended Model and the second Extended Model are defined as tying as shown in Figure 3 and Figure 4 Structure, equally in a tabular form but is not limited to form, includes in form shown in Fig. 3：Second generic and at least one second Model attributes, the second generic is represented with variable X XX-List, represents the first foreign languages translation word list, the first foreign languages translation word list It is corresponding with the application field (XXX) belonging to Chinese background semantic model, each second model attributes triple (KEYWORDLIST_i,WEIGHTi,α_i) represent, wherein, KEYWORDLIST_iIt is corresponding for above-mentioned Chinese key KEYWORDi Second foreign languages translation word list, and translation word and Chinese key KEYWORD in table_iIt is synonymous, WEIGHT_iFor KEYWORD_i Shared weight, α in whole Chinese background semantic model_iFor the second foreign languages translation word list KEYWORDLIST_iIn corresponding Literary keyword KEYWORD_iBetween similarity threshold, typically, the second foreign languages translation word list KEYWORDLIST_iIt is crucial with Chinese Word KEYWORD_iBetween semantic distance be less than α_i, and semantic distance between the two is smaller, illustrate both it is semantic closer to.

Preferably, the second foreign languages translation word list KEYWORDLIST_iIt is defined as Fig. 4 such as and shows structure, equally in a tabular form But it is not limited to form to represent, includes at least one cellular construction, two tuples of each cellular construction in the form (KEYWORD_i,Property_i) represent, wherein, KEYWORDi is corresponding with the Chinese key in Chinese background semantic model English keyword or third party's object language keyword, Propertyi be KEYWORDi part of speech.

Set up foreign language and expand the process of background semantic model example (i.e. including setting up Chinese through English to third party's language Extended Model example and the Chinese Extended Model example procedure to third party's language of foundation) be specially：By outside first in model Text translation word list XXX-List, the second foreign languages translation word list KEYWORDLIST_i, weight WEIGHTi and similarity threshold α_i Specific string value is instantiated as in third party's language, en2rdEMI and rdEMI is obtained.

S3, sets up the background semantic model example (rdBMI) and linguistic context structural model example of third party's language target text (rdsCMI)。

Specifically, in the embodiment of the present invention, the background semantic model of third party's language target text is defined as shown in Figure 5 Structure, is equally represented in a tabular form, and form shown in Fig. 5 includes the notional word list and at least of third party's language target text One the 3rd model attributes, target text notional word list is also represented with variable X XX-List；Each attribute triple (NotationalWord_i,Property_i,WFrequency_i) represent, wherein, NotationalWord_iFor third party's language mesh Mark the notional word in text, Property_iFor the notional word NotationalWord in third party's language target text_iPart of speech, WFrequency_iFor notional word NotationalWord_iWord frequency in third party's language target text.Target text notional word Target text notional word NotationalWord in list_iIt is preferred that by word frequency WFrequency_iDescending is arranged.

The process for setting up the background semantic model example of third party's language target text is by the target text in model Notional word NotationalWord in notional word list XXX-List, target text_i, part of speech Property_i, word frequency WFrequency_iSpecific string value is instantiated as in third party's language, rdBMI is obtained.

In the embodiment of the present invention, the linguistic context structural model of third party's language target text is defined as tying as shown in Figure 6 and Figure 7 Structure, is also to represent in a tabular form, includes the notional word list of third party's language target text in form shown in Fig. 6, and extremely Few 4th model attributes, target text notional word list is equally represented with variable X XX-List；Each 4th model attributes Use triple NotationalWordArrayList_iRepresent, NotationalWordArrayList_iRepresent one in target text The notional word structural array of individual complete clause.

Preferably, as shown in fig. 7, in target text a complete clause notional word structural array NotationalWordArrayList_iIncluding multiple cellular constructions, each cellular construction triple (NotationalWord_i, Property_i,WFrequency_i) represent, wherein, NotationalWord_iFor the notional word in third party's language target text, Property_iFor the notional word NotationalWord in third party's language target text_iPart of speech, WFrequency_iFor real justice Word NotationalWord_iWord frequency in third party's language target text.Target text in target text notional word list Notional word NotationalWord_iIt is preferred that by word frequency WFrequency_iDescending is arranged.

The process for setting up the linguistic context structural model example of third party's language target text is to arrange the notional word in model Table X XX-List and notional word structural array NotationalWordArrayList_iIt is instantiated as in third party's language specific String value, obtains rdsCMI.

S4, calculates the background semantic model example (rdBMI) and target linguistic context structural model of third party's language target text Example (rdsCMI) arrives the semantic distance that foreign language expands background semantic model example (rdEMI and en2rdEMI) respectively, and with this Evaluation and test third party's language text it is semantic whether with the Chinese that is provided is semantic coincide.

Specifically, as shown in figure 8, including following steps：

S401, the background semantic model example (rdBMI) and the second Extended Model for obtaining third party's language target text is real Example (rdEMI), and third party's language target text background semantic model example (rdBMI) and the first Extended Model example (en2rdEMI) common factor.

Specifically, met in intersection semanteme extremal function cSMVF (A, B, α) under the basis of certain condition, use threshold value phase Intersect like degree and compare function TSCC (δ, α, A, B) to obtain rdBMI and rdEMI respectively, and rdBMI and en2rdEMI friendship Collection, the common factor for defining the rdBMI calculated and rdEMI is Φ₁, rdBMI and rdEMI common factor are Φ₂, i.e.,：

TSCC (δ, α, rdBMI, rdEMI)=Φ₁；

TSCC (δ, α, rdBMI, en2rdEMI)=Φ₂；

If Φ₁For empty or Φ₂For sky, that is to say, that bright current third party language target text and the current Chinese back of the body Scape semantic model example semantic is misfitted, then stops calculating；Otherwise, next step step S402 is continued.

Wherein, threshold similarity intersects comparison function TSCC (δ, α, A, B) and is defined as：

IfCSMVF (A, B, α) >=α is made, and | A ∩ B | >=δ；Wherein, δ is set A, B friendship cardinality, and α is phase Like degree threshold value, δ, α is adjustable parameter.

Intersect semanteme extremal function cSMVF (A, B, α) to be defined as：

a_k∈ A, | A | it is A gesture；

And 1≤m, i≤| B |, There are SemanticDisVal (a_k,b_i)≥SemanticDisVal(a_k,b_m) and SemanticDisVal (a_k,b_i) >=α and b_i∈B And b_m∈B。

SemanticDisVal(a_i, b_i) be defined as：Keyword a_i, b_iAccording to semantic distance meter in keyword ontology library Calculate gained semantic similarity numerical value.

S402, the background semantic model example (rdBMI) and the second Extended Model for calculating third party's language target text is real Example (rdEMI), and third party's language target text background semantic model example (rdBMI) and the first Extended Model example (en2rdEMI) background semantic correlation.

Specifically, rdBMI and rdEMI, and rdBMI are calculated using above-mentioned intersection semanteme extremal function cSMVF (A, B, α) With en2rdEMI background semantic correlation, intersecting semanteme extremal function cSMVF (A, B, α) definition can retouch referring in particular to above-mentioned State, repeat no more here.

S403, the linguistic context structural model example (rdsCMI) and the second Extended Model for calculating third party's language target text is real Example (rdEMI), and third party's language target text linguistic context structural model example (rdsCMI) and the first Extended Model example (en2rdEMI) sentence structure semantic dependency.

Specifically, rdsCMI and rdEMI are calculated using linguistic context structural model semantic valuation function cMSSF (A, B, α), RdsCMI and en2rdEMI sentence structure semantic dependency.

Linguistic context structural model semantic valuation function cMSSF (A, B, α) is defined as：

In function, n is notional word structural array NotationalWordArrayList under attribute in rdsCMI model structures_i Number, SSSF is with reference to MODEL₂The function of single statement structure semanticses is calculated, δ is mediation parameter, SentenceNumber_ Rd is the actual contained single statement number in third party's network text；

Wherein, n is Array_keywords_iThe length of array, θ is MODEL₂In with Nw_iThat most like, with outside Text expands institute's Weight (WEIGHT i.e. in Fig. 3 cellular constructions in background semantic model_i), Nw_iFor Array_keywords_iIn Element in notional word；ε：If Nw_iIt is identical with the part of speech of most like word, value ρ, otherwise, value σ, wherein ρ and σ are It is mediation parameter；

S403, calculates background semantic model example (rdBMI)/linguistic context structural model example of third party's language target text (rdsCMI) with the second Extended Model example (rdEMI), and third party's language target text background semantic model example (rdBMI) the Technique Using Both Text correlation of/linguistic context structural model example (rdsCMI) and the first Extended Model example (en2rdEMI), Then according to the Technique Using Both Text correlation that calculates judge third party's language target text whether with current Chinese background semantic Model instance semanteme coincide.

Specifically, evaluate and test function SSRF (A, B) to calculate rdBMI/rdsCMI and rdEMI using Technique Using Both Text correlation, And rdBMI/rdsCMI and en2rdEMI Technique Using Both Text correlation.

Technique Using Both Text correlation evaluation and test function SSRF (A, B, C, D) is defined as：

If SSSRF (rdsCMI, rdBMI, rdEMI, enEMI) >=ω, then it is assumed that the semanteme of third party's language target text Matched with the semanteme of current Chinese background semantic model example, otherwise it is assumed that the two is misfitted, wherein, ω is adjustable ginseng Number.

It should be noted that above-mentioned i=1,2 ... n, wherein, i, n is the natural number more than or equal to 1.

A kind of semantic goodness of fit evaluating apparatus towards third party's language text provided by the present invention, including：

First model instance builds module, for setting up source background semantic model example, based on the mode of automatic guide, draws Lead the semantic feature that user creates the Chinese knowledge for wanting matching.

Specifically, the Chinese background semantic model structure of the first model instance structure module construction is shown in above-mentioned specific descriptions, First model instance, which builds module, to be used to the application field of model, original language keyword and Weights Example turning to specific character String value.

Second model instance builds module, for setting up the expansion background semantic model corresponding with source background semantic model Example.

Specifically, the second model instance, which builds module, includes the first Extended Model example structure module and the second Extended Model Example builds module, and the first Extended Model example, which builds module, to be used to build the first Extended Model, and the first Extended Model is centre Extended Model example of the language to object language；Second Extended Model example, which builds module, to be used to build the second Extended Model, the Two Extended Models are Extended Model example of the original language to object language.

The structure of first Extended Model and the second Extended Model is identical, refers to and is described above, and the second model instance builds mould Block is used to the first foreign languages translation word list of model, the second foreign languages translation word list, weight and similarity threshold being instantiated as Specific string value in third party's language.

3rd model instance builds module, for setting up target background semantic model example and linguistic context structural model example.

The structure of target background semantic model and linguistic context structural model, which is referred to, to be described above, and the 3rd model instance builds module For by the target text notional word list in target background semantic model, target text notional word NotationalWord_i, word Property Property_iWith word frequency WFrequency_iSpecific string value is instantiated as in third party's language, and for by target The list of target text notional word, notional word structural array NotationalWordArrayList in linguistic context structural model_iExample Turn to specific string value in third party's language.

Specifically, semantic goodness of fit evaluation and test module includes：

Sentence structure semantic dependency, for obtaining target linguistic context structural model example and the first Extended Model example, with And the sentence structure semantic dependency of target linguistic context structural model example and the second Extended Model example；

Technique Using Both Text correlation evaluation and test module, for target background semantic model example/target linguistic context structural model example Mould is expanded with the first Extended Model example, and target background semantic model example/target linguistic context structural model example and second The Technique Using Both Text correlation of type example；Then judge that third party's language target text is according to the Technique Using Both Text correlation calculated It is no to be coincide with current source background semantic model example semantic.

Preferably, common factor computing module, is intersected using threshold similarity and compares function TSCC (δ, α, A, B), with reference to intersection Semantic extremal function cSMVF (A, B, α) is calculated.

Background semantic correlation calculations module, is calculated using the intersection semanteme extremal function cSMVF (A, B, α).

Sentence structure semantic dependency computing module, is entered using linguistic context structural model semantic valuation function cMSSF (A, B, α) Row is calculated.

Technique Using Both Text correlation evaluation and test module, evaluates and tests function SSRF (A, B, C, D) using Technique Using Both Text correlation and is counted Calculate.

These above-mentioned functions have a detailed description in evaluating method, just repeat no more.

The technology contents and technical characteristic of the present invention have revealed that as above, but those skilled in the art still may base Make a variety of replacements and modification without departing substantially from spirit of the present invention, therefore, the scope of the present invention in teachings of the present invention and announcement The content disclosed in embodiment should be not limited to, and various replacements and modification without departing substantially from the present invention should be included, and is this patent Shen Please claim covered.

Claims

1. a kind of semantic goodness of fit evaluating method, it is characterised in that including：

S1, sets up source background semantic model example；

S3, sets up target background semantic model example and target linguistic context structural model example；

S4, calculates the target background semantic model example and target linguistic context structural model example arrives the expansion background language respectively The semantic distance of adopted model instance, according to the semantic distance that calculates evaluate and test third party's language target text it is semantic whether with institute The original language semanteme of offer matches.

2. according to the method described in claim 1, it is characterised in that the source background semantic model includes the application field of model With at least one the first model attributes, each first model attributes are with two tuple (KEYWORD_i,WEIGHT_i) represent, its In, KEYWORD_iFor original language keyword, WEIGHT_iFor KEYWORD_iThe weight occupied in whole source background semantic model, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=；The source background semantic model example procedure of setting up includes：Will Application field, original language keyword and the Weights Example of the model turn to specific string value.

3. method according to claim 2, it is characterised in that setting up the expansion background semantic model example procedure includes The first Extended Model example and the second Extended Model example are set up, wherein, first Extended Model is original language through middle language The Extended Model of object language is sayed, second Extended Model is Extended Model of the original language to object language.

4. method according to claim 3, it is characterised in that first Extended Model and the second Extended Model structure phase Together, including the first foreign languages translation word list, and at least one second model attributes, the first foreign languages translation word list and source The application field of background semantic model is corresponding, each second model attributes triple (KEYWORDLIST_i, WEIGHT_i,α_i) represent, wherein, KEYWORDLIST_iFor with KEYWORD_iThe second corresponding foreign languages translation word list, WEIGHT_i For KEYWORD_iThe weight occupied in whole source background semantic model, α_iIt is original language keyword between foreign languages translation word Similarity threshold, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=；Described set up expands background semantic model Example procedure includes：By the first foreign languages translation word list of the model, the second foreign languages translation word list, weight and similarity threshold Value is instantiated as in third party's language specific string value.

5. method according to claim 4, it is characterised in that the second foreign languages translation word list KEYWORDLIST_iIn Including at least one cellular construction, each cellular construction is with two tuple (KEYWORD_i,Property_i) represent, wherein, KEYWORD_iFor the original language keyword in the background semantic model of source corresponding intermediate language keyword or third party's language barrier Keyword, Property_iFor KEYWORD_iPart of speech, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.

6. according to the method described in claim 1, it is characterised in that the target background semantic model is included by word frequency descending row The target text notional word list of row, and at least one the 3rd model attributes, each 3rd model attributes triple (NotationalWord_i,Property_i,WFrequency_i) represent, wherein, NotationalWord_iFor the real justice of target text Word, Property_iFor NotationalWord_iPart of speech, WFrequency_iIt is NotationalWordi in target text Word frequency, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=；It is described to set up target background semantic model example procedure Including：By the target text notional word list of the model, target text notional word NotationalWord_i, part of speech Property_iWith word frequency WFrequency_iSpecific string value is instantiated as in third party's language.

7. according to the method described in claim 1, it is characterised in that the target linguistic context structural model is included by word frequency descending row The target text notional word list of row, and at least one the 4th model attributes, each 4th model attributes are used NotationalWordArrayList_iRepresent, wherein, NotationalWordArrayList_iRepresent target text in one it is complete The notional word structural array of whole clause, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=；It is described to set up target language Border structural model example procedure includes：By the target text notional word list of the model, notional word structural array NotationalWordArrayList_iSpecific string value is instantiated as in third party's language.

8. method according to claim 7, it is characterised in that the notional word structural array NotationalWordArrayList_iIncluding multiple cellular constructions, each cellular construction triple (NotationalWord_i,Property_i,WFrequency_i) represent, wherein, NotationalWord_iFor the real justice of target text Word, Property_iFor NotationalWord_iPart of speech, WFrequency_iIt is NotationalWordi in target text Word frequency, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.

9. method according to claim 4, it is characterised in that the S4 includes：

S401, obtains target background semantic model example and the first Extended Model example, and target background semantic model example With the common factor of the second Extended Model example, if two occur simultaneously not for sky, into S402；

S402, calculates target background semantic model example and the first Extended Model example, and target background semantic model example With the background semantic correlation of the second Extended Model example；

S403, calculates target linguistic context structural model example and the first Extended Model example, and target linguistic context structural model example With the sentence structure semantic dependency of the second Extended Model example；

S404, calculates target background semantic model example/target linguistic context structural model example and the first Extended Model example, and The Technique Using Both Text correlation of target background semantic model example/target linguistic context structural model example and the second Extended Model example； Then according to the Technique Using Both Text correlation that calculates judge third party's language target text whether with current source background semantic mould Type example semantic coincide.

10. method according to claim 9, it is characterised in that

In step S401, intersected using threshold similarity and compare function TSCC (δ, α, A, B), with reference to the semantic extremal function of intersection CSMVF (A, B, α) is calculated；

Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable parameter.

11. a kind of semantic goodness of fit evaluating apparatus towards third party's language text, it is characterised in that including：

First model instance builds module, for setting up source background semantic model example, and based on the mode of automatic guide, guiding is used Family creates the semantic feature for the Chinese knowledge for wanting matching；

Second model instance builds module, real for setting up the expansion background semantic model corresponding with source background semantic model Example；

Semantic goodness of fit evaluation and test module, for calculating target background semantic model example and target linguistic context structural model example difference Third party's language target text is evaluated and tested to the semantic distance for expanding background semantic model example, and according to the semantic distance calculated Semanteme whether matched with the original language semanteme provided.

12. device according to claim 11, it is characterised in that second model instance builds module and opened up including first Open up model instance and build module and the second Extended Model example structure module, the first Extended Model example, which builds module, to be used for The first Extended Model is built, first Extended Model is Extended Model example of the original language through intermediate language to object language； The second Extended Model example, which builds module, to be used to build the second Extended Model, and second Extended Model is original language to mesh The Extended Model example of poster speech.

13. device according to claim 12, it is characterised in that the semantic goodness of fit evaluation and test module includes：

Common factor computing module, for obtaining target background semantic model example and the first Extended Model example, and target background The common factor of semantic model example and the second Extended Model example；

Background semantic correlation calculations module, for obtaining target background semantic model example and the first Extended Model example, with And the background semantic correlation of target background semantic model example and the second Extended Model example；

Sentence structure semantic dependency, for obtaining target linguistic context structural model example and the first Extended Model example, and mesh The sentence structure semantic dependency of poster border structural model example and the second Extended Model example；

Technique Using Both Text correlation evaluation and test module, for calculating target background semantic model example/target linguistic context structural model example Mould is expanded with the first Extended Model example, and target background semantic model example/target linguistic context structural model example and second The Technique Using Both Text correlation of type example；Then judge that third party's language target text is according to the Technique Using Both Text correlation calculated It is no to be coincide with current source background semantic model example semantic.

14. device according to claim 13, it is characterised in that

The common factor computing module, is intersected using threshold similarity and compares function TSCC (δ, α, A, B), with reference to the semantic extreme value of intersection Function cSMVF (A, B, α) is calculated；

The background semantic correlation calculations module, is calculated using the intersection semanteme extremal function cSMVF (A, B, α)；

The sentence structure semantic dependency module, is counted using linguistic context structural model semantic valuation function cMSSF (A, B, α) Calculate；

The Technique Using Both Text correlation evaluation and test module, evaluates and tests function SSRF (A, B, C, D) using Technique Using Both Text correlation and is counted Calculate；