The content of the invention
It is an object of the invention to overcome the defect of prior art there is provided one kind using Chinese as original language, towards third party
The semantic goodness of fit evaluating method and device of language, based on the program, allow users to, based on Chinese, accurately find desired
Third party's language description similar or similar relevant issues or people, thing, thing.
To achieve the above object, the present invention proposes following technical scheme:A kind of semanteme kiss towards third party's language text
Right evaluating method, including:
S1, sets up source background semantic model example;
S2, sets up the expansion background semantic model example corresponding with source background semantic model;
S3, sets up target background semantic model example and linguistic context structural model example;
S4, calculates the target background semantic model example and target linguistic context structural model example to the expansion background language
The semantic distance of adopted model instance, according to the semantic distance that calculates evaluate and test third party's language target text it is semantic whether with institute
The original language semanteme of offer matches.
Preferably, the source background semantic model includes the application field and at least one first model attributes of model, often
Individual first model attributes are with two tuple (KEYWORDi,WEIGHTi) represent, wherein, KEYWORDiFor original language keyword,
WEIGHTiFor KEYWORDiThe weight occupied in whole source background semantic model, (1,2 ... n), i, and n is to be more than by i=
Natural number equal to 1;The source background semantic model example procedure of setting up includes:By the application field of the model, original language
Keyword and Weights Example turn to specific string value.
Preferably, setting up the expansion background semantic model example procedure includes setting up the first Extended Model example and second
Extended Model example, wherein, first Extended Model is Extended Model of the original language through intermediate language to object language, described
Second Extended Model is Extended Model of the original language to object language.
Preferably, first Extended Model and the second Extended Model structure are identical, are arranged including the first foreign languages translation word
Table, and at least one second model attributes, the first foreign languages translation word list and the application field phase of source background semantic model
Correspondence, each second model attributes triple (KEYWORDLISTi,WEIGHTi,αi) represent, wherein,
KEYWORDLISTiFor with KEYWORDiThe second corresponding foreign languages translation word list, WEIGHTiFor KEYWORDiIn the whole source back of the body
The weight occupied in scape semantic model, αiFor original language keyword to the similarity threshold between foreign languages translation word, i=(1,
2 ... n), i, and n is the natural number more than or equal to 1;The expansion background semantic model example procedure of setting up includes:Will be described
The first foreign languages translation word list, the second foreign languages translation word list, weight and the semantic distance examples of threshold of model are turned to specifically
String value.
Preferably, the second foreign languages translation word list KEYWORDLISTiInclude at least one cellular construction, each
The cellular construction is with two tuple (KEYWORDi,Propertyi) represent, wherein, KEYWORDiFor with the background semantic model of source
The corresponding intermediate language keyword or third party's language keywords of original language keyword, PropertyiFor KEYWORDiWord
Property, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.
Preferably, the target background semantic model includes the target text notional word list arranged by word frequency descending, and
At least one the 3rd model attributes, each 3rd model attributes triple (NotationalWordi,Propertyi,
WFrequencyi) represent, wherein, NotationalWordiFor target text notional word, PropertyiFor
NotationalWordiPart of speech, WFrequencyiThe word frequency for being NotationalWordi in target text, i=(1,
2 ... n), i, and n is the natural number more than or equal to 1;The target background semantic model example procedure of setting up includes:Will be described
The target text notional word list of model, target text notional word NotationalWordi, part of speech PropertyiAnd word frequency
WFrequencyiIt is instantiated as specific string value.
Preferably, the target linguistic context structural model includes the target text notional word list arranged by word frequency descending, and
At least one the 4th model attributes, each 4th model attributes NotationalWordArrayListiRepresent, wherein,
NotationalWordArrayListiThe notional word structural array of a complete clause in expression target text, i=(1,
2 ... n), i, and n is the natural number more than or equal to 1;The target linguistic context structural model example procedure of setting up includes:Will be described
The target text notional word list of model, notional word structural array NotationalWordArrayListiIt is instantiated as third party
Specific string value in language.
Preferably, the notional word structural array NotationalWordArrayListiIncluding multiple cellular constructions, often
Individual cellular construction triple (NotationalWordi,Propertyi,WFrequencyi) represent, wherein,
NotationalWordiFor target text notional word, PropertyiFor NotationalWordiPart of speech, WFrequencyiFor
Word frequency of the NotationalWordi in target text, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.
Preferably, the S4 includes:
S401, obtains target background semantic model example and the first Extended Model example, and target background semantic model
The common factor of example and the second Extended Model example, if two common factors are not sky, into S402;
S402, calculates target background semantic model example and the first Extended Model example, and target background semantic model
The background semantic correlation of example and the second Extended Model example;
S403, calculates target linguistic context structural model example and the first Extended Model example, and target linguistic context structural model
The sentence structure semantic dependency of example and the second Extended Model example;
S404, calculates target background semantic model example/target linguistic context structural model example and the first Extended Model example,
And target background semantic model example/target linguistic context structural model example is related to the Technique Using Both Text of the second Extended Model example
Property;Then according to the Technique Using Both Text correlation that calculates judge third party's language target text whether with current source background semantic
Model instance semanteme coincide.
Preferably, in step S401, intersected using threshold similarity and compare function TSCC (δ, α, A, B), with reference to intersection language
Adopted extremal function cSMVF (A, B, α) is calculated;
In step S402, calculated using the intersection semanteme extremal function cSMVF (A, B, α);
In step S403, calculated using linguistic context structural model semantic valuation function cMSSF (A, B, α);
In step S404, function SSRF (A, B, C, D) is evaluated and tested using Technique Using Both Text correlation and calculated;
Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable ginseng
Number.
Present invention also offers another technical scheme:A kind of semantic goodness of fit towards third party's language text is evaluated and tested
Device, including:
First model instance builds module, for setting up source background semantic model example, based on the mode of automatic guide, draws
Lead the semantic feature that user creates the Chinese knowledge for wanting matching;
Second model instance builds module, for setting up the expansion background semantic model corresponding with source background semantic model
Example;
3rd model instance builds module, for setting up target background semantic model example and linguistic context structural model example;
Semantic goodness of fit evaluation and test module, for calculating target background semantic model example and target linguistic context structural model example
The semantic distance for expanding background semantic model example is arrived respectively, and third party's language target is evaluated and tested according to the semantic distance calculated
Whether the semanteme of text matches with the original language semanteme provided.
Preferably, second model instance, which builds module, includes the first Extended Model example structure module and the second expansion
Model instance builds module, and the first Extended Model example, which builds module, to be used to build the first Extended Model, and described first opens up
Exhibition model is Extended Model example of the original language through intermediate language to object language;The second Extended Model example builds module
For building the second Extended Model, second Extended Model is Extended Model example of the original language to object language.
Preferably, the semantic goodness of fit evaluation and test module includes:
Common factor computing module, for obtaining target background semantic model example and the first Extended Model example, and target
The common factor of background semantic model example and the second Extended Model example;
Background semantic correlation calculations module, it is real for obtaining target background semantic model example and the first Extended Model
Example, and target background semantic model example and the second Extended Model example background semantic correlation;
Sentence structure semantic dependency module, it is real for obtaining target linguistic context structural model example and the first Extended Model
Example, and target linguistic context structural model example and the second Extended Model example sentence structure semantic dependency;
Technique Using Both Text correlation evaluation and test module, for calculating target background semantic model example/target linguistic context structural model
Example and the first Extended Model example, and target background semantic model example/target linguistic context structural model example are opened up with second
Open up the Technique Using Both Text correlation of model instance;Then third party's language target text is judged according to the Technique Using Both Text correlation calculated
Whether this coincide with current source background semantic model example semantic.
Preferably, the common factor computing module, is intersected using threshold similarity and compares function TSCC (δ, α, A, B), with reference to
Intersect semanteme extremal function cSMVF (A, B, α) to be calculated;
The background semantic correlation calculations module, is counted using the intersection semanteme extremal function cSMVF (A, B, α)
Calculate;
The sentence structure semantic dependency computing module, using linguistic context structural model semantic valuation function cMSSF (A, B,
α) calculated;
The Technique Using Both Text correlation evaluation and test module, evaluates and tests function SSRF (A, B, C, D) using Technique Using Both Text correlation and enters
Row is calculated;
Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable ginseng
Number.
Compared with prior art, the present invention can be achieved to carry out the text of third party's language performance with original language (such as Chinese)
Semantic matches and lookup, expand semantic matches and the scope searched, the sources of knowledge have been expanded significantly.
Embodiment
Below in conjunction with the accompanying drawing of the present invention, the technical scheme to the embodiment of the present invention carries out clear, complete description.
A kind of semantic goodness of fit evaluating method and device towards third party's language text disclosed in the embodiment of the present invention,
Realize using Chinese as original language, semantic matches and lookup are carried out towards third party's language text, be certainly not limited to during original language is
Text or other languages.
As shown in figure 1, a kind of semantic goodness of fit towards third party's language text disclosed in the embodiment of the present invention is evaluated and tested
Method, comprises the following steps:
S1, sets up Chinese background semantic model example.
Specifically, in the embodiment of the present invention, Chinese background semantic model is defined as structure as shown in Figure 2, in a tabular form
Represent, include in the form:First generic and at least one first model attributes, the first generic are represented with variable X XX, are represented
Application field belonging to whole Chinese background semantic model;Each first model attributes are with two tuple (KEYWORDi,WEIGHTi)
Represent, wherein, KEYWORDiFor Chinese key, WEIGHTiFor KEYWORDiShared by whole Chinese background semantic model
Weight.
The process for setting up Chinese background semantic model example is by the generic XXX in model, original language keyword
KEYWORDi, weight WEIGHTiIt is instantiated as specific string value.
S2, sets up foreign language corresponding with Chinese background semantic model and expands background semantic model example.
Specifically, setting up foreign language expansion background semantic model example includes two processes, that is, sets up the first Extended Model real
Example and the second Extended Model example, the first Extended Model example are established as original language and arrive third party's language through intermediate language (such as English)
The Extended Model example (en2rdEMI) of speech, another is exactly direct Chinese of setting up to the Extended Model example of third party's language
(rdEMI)。
In embodiments of the present invention, the first Extended Model and the second Extended Model are defined as tying as shown in Figure 3 and Figure 4
Structure, equally in a tabular form but is not limited to form, includes in form shown in Fig. 3:Second generic and at least one second
Model attributes, the second generic is represented with variable X XX-List, represents the first foreign languages translation word list, the first foreign languages translation word list
It is corresponding with the application field (XXX) belonging to Chinese background semantic model, each second model attributes triple
(KEYWORDLISTi,WEIGHTi,αi) represent, wherein, KEYWORDLISTiIt is corresponding for above-mentioned Chinese key KEYWORDi
Second foreign languages translation word list, and translation word and Chinese key KEYWORD in tableiIt is synonymous, WEIGHTiFor KEYWORDi
Shared weight, α in whole Chinese background semantic modeliFor the second foreign languages translation word list KEYWORDLISTiIn corresponding
Literary keyword KEYWORDiBetween similarity threshold, typically, the second foreign languages translation word list KEYWORDLISTiIt is crucial with Chinese
Word KEYWORDiBetween semantic distance be less than αi, and semantic distance between the two is smaller, illustrate both it is semantic closer to.
Preferably, the second foreign languages translation word list KEYWORDLISTiIt is defined as Fig. 4 such as and shows structure, equally in a tabular form
But it is not limited to form to represent, includes at least one cellular construction, two tuples of each cellular construction in the form
(KEYWORDi,Propertyi) represent, wherein, KEYWORDi is corresponding with the Chinese key in Chinese background semantic model
English keyword or third party's object language keyword, Propertyi be KEYWORDi part of speech.
Set up foreign language and expand the process of background semantic model example (i.e. including setting up Chinese through English to third party's language
Extended Model example and the Chinese Extended Model example procedure to third party's language of foundation) be specially:By outside first in model
Text translation word list XXX-List, the second foreign languages translation word list KEYWORDLISTi, weight WEIGHTi and similarity threshold αi
Specific string value is instantiated as in third party's language, en2rdEMI and rdEMI is obtained.
S3, sets up the background semantic model example (rdBMI) and linguistic context structural model example of third party's language target text
(rdsCMI)。
Specifically, in the embodiment of the present invention, the background semantic model of third party's language target text is defined as shown in Figure 5
Structure, is equally represented in a tabular form, and form shown in Fig. 5 includes the notional word list and at least of third party's language target text
One the 3rd model attributes, target text notional word list is also represented with variable X XX-List;Each attribute triple
(NotationalWordi,Propertyi,WFrequencyi) represent, wherein, NotationalWordiFor third party's language mesh
Mark the notional word in text, PropertyiFor the notional word NotationalWord in third party's language target textiPart of speech,
WFrequencyiFor notional word NotationalWordiWord frequency in third party's language target text.Target text notional word
Target text notional word NotationalWord in listiIt is preferred that by word frequency WFrequencyiDescending is arranged.
The process for setting up the background semantic model example of third party's language target text is by the target text in model
Notional word NotationalWord in notional word list XXX-List, target texti, part of speech Propertyi, word frequency
WFrequencyiSpecific string value is instantiated as in third party's language, rdBMI is obtained.
In the embodiment of the present invention, the linguistic context structural model of third party's language target text is defined as tying as shown in Figure 6 and Figure 7
Structure, is also to represent in a tabular form, includes the notional word list of third party's language target text in form shown in Fig. 6, and extremely
Few 4th model attributes, target text notional word list is equally represented with variable X XX-List;Each 4th model attributes
Use triple NotationalWordArrayListiRepresent, NotationalWordArrayListiRepresent one in target text
The notional word structural array of individual complete clause.
Preferably, as shown in fig. 7, in target text a complete clause notional word structural array
NotationalWordArrayListiIncluding multiple cellular constructions, each cellular construction triple (NotationalWordi,
Propertyi,WFrequencyi) represent, wherein, NotationalWordiFor the notional word in third party's language target text,
PropertyiFor the notional word NotationalWord in third party's language target textiPart of speech, WFrequencyiFor real justice
Word NotationalWordiWord frequency in third party's language target text.Target text in target text notional word list
Notional word NotationalWordiIt is preferred that by word frequency WFrequencyiDescending is arranged.
The process for setting up the linguistic context structural model example of third party's language target text is to arrange the notional word in model
Table X XX-List and notional word structural array NotationalWordArrayListiIt is instantiated as in third party's language specific
String value, obtains rdsCMI.
S4, calculates the background semantic model example (rdBMI) and target linguistic context structural model of third party's language target text
Example (rdsCMI) arrives the semantic distance that foreign language expands background semantic model example (rdEMI and en2rdEMI) respectively, and with this
Evaluation and test third party's language text it is semantic whether with the Chinese that is provided is semantic coincide.
Specifically, as shown in figure 8, including following steps:
S401, the background semantic model example (rdBMI) and the second Extended Model for obtaining third party's language target text is real
Example (rdEMI), and third party's language target text background semantic model example (rdBMI) and the first Extended Model example
(en2rdEMI) common factor.
Specifically, met in intersection semanteme extremal function cSMVF (A, B, α) under the basis of certain condition, use threshold value phase
Intersect like degree and compare function TSCC (δ, α, A, B) to obtain rdBMI and rdEMI respectively, and rdBMI and en2rdEMI friendship
Collection, the common factor for defining the rdBMI calculated and rdEMI is Φ1, rdBMI and rdEMI common factor are Φ2, i.e.,:
TSCC (δ, α, rdBMI, rdEMI)=Φ1;
TSCC (δ, α, rdBMI, en2rdEMI)=Φ2;
If Φ1For empty or Φ2For sky, that is to say, that bright current third party language target text and the current Chinese back of the body
Scape semantic model example semantic is misfitted, then stops calculating;Otherwise, next step step S402 is continued.
Wherein, threshold similarity intersects comparison function TSCC (δ, α, A, B) and is defined as:
IfCSMVF (A, B, α) >=α is made, and | A ∩ B | >=δ;Wherein, δ is set A, B friendship cardinality, and α is phase
Like degree threshold value, δ, α is adjustable parameter.
Intersect semanteme extremal function cSMVF (A, B, α) to be defined as:
ak∈ A, | A | it is A gesture;
And 1≤m, i≤| B |,
There are SemanticDisVal (ak,bi)≥SemanticDisVal(ak,bm) and SemanticDisVal (ak,bi) >=α and bi∈B
And bm∈B。
SemanticDisVal(ai, bi) be defined as:Keyword ai, biAccording to semantic distance meter in keyword ontology library
Calculate gained semantic similarity numerical value.
S402, the background semantic model example (rdBMI) and the second Extended Model for calculating third party's language target text is real
Example (rdEMI), and third party's language target text background semantic model example (rdBMI) and the first Extended Model example
(en2rdEMI) background semantic correlation.
Specifically, rdBMI and rdEMI, and rdBMI are calculated using above-mentioned intersection semanteme extremal function cSMVF (A, B, α)
With en2rdEMI background semantic correlation, intersecting semanteme extremal function cSMVF (A, B, α) definition can retouch referring in particular to above-mentioned
State, repeat no more here.
S403, the linguistic context structural model example (rdsCMI) and the second Extended Model for calculating third party's language target text is real
Example (rdEMI), and third party's language target text linguistic context structural model example (rdsCMI) and the first Extended Model example
(en2rdEMI) sentence structure semantic dependency.
Specifically, rdsCMI and rdEMI are calculated using linguistic context structural model semantic valuation function cMSSF (A, B, α),
RdsCMI and en2rdEMI sentence structure semantic dependency.
Linguistic context structural model semantic valuation function cMSSF (A, B, α) is defined as:
In function, n is notional word structural array NotationalWordArrayList under attribute in rdsCMI model structuresi
Number, SSSF is with reference to MODEL2The function of single statement structure semanticses is calculated, δ is mediation parameter, SentenceNumber_
Rd is the actual contained single statement number in third party's network text;
Wherein, n is Array_keywordsiThe length of array, θ is MODEL2In with NwiThat most like, with outside
Text expands institute's Weight (WEIGHT i.e. in Fig. 3 cellular constructions in background semantic modeli), NwiFor Array_keywordsiIn
Element in notional word;ε:If NwiIt is identical with the part of speech of most like word, value ρ, otherwise, value σ, wherein ρ and σ are
It is mediation parameter;
S403, calculates background semantic model example (rdBMI)/linguistic context structural model example of third party's language target text
(rdsCMI) with the second Extended Model example (rdEMI), and third party's language target text background semantic model example
(rdBMI) the Technique Using Both Text correlation of/linguistic context structural model example (rdsCMI) and the first Extended Model example (en2rdEMI),
Then according to the Technique Using Both Text correlation that calculates judge third party's language target text whether with current Chinese background semantic
Model instance semanteme coincide.
Specifically, evaluate and test function SSRF (A, B) to calculate rdBMI/rdsCMI and rdEMI using Technique Using Both Text correlation,
And rdBMI/rdsCMI and en2rdEMI Technique Using Both Text correlation.
Technique Using Both Text correlation evaluation and test function SSRF (A, B, C, D) is defined as:
If SSSRF (rdsCMI, rdBMI, rdEMI, enEMI) >=ω, then it is assumed that the semanteme of third party's language target text
Matched with the semanteme of current Chinese background semantic model example, otherwise it is assumed that the two is misfitted, wherein, ω is adjustable ginseng
Number.
It should be noted that above-mentioned i=1,2 ... n, wherein, i, n is the natural number more than or equal to 1.
A kind of semantic goodness of fit evaluating apparatus towards third party's language text provided by the present invention, including:
First model instance builds module, for setting up source background semantic model example, based on the mode of automatic guide, draws
Lead the semantic feature that user creates the Chinese knowledge for wanting matching.
Specifically, the Chinese background semantic model structure of the first model instance structure module construction is shown in above-mentioned specific descriptions,
First model instance, which builds module, to be used to the application field of model, original language keyword and Weights Example turning to specific character
String value.
Second model instance builds module, for setting up the expansion background semantic model corresponding with source background semantic model
Example.
Specifically, the second model instance, which builds module, includes the first Extended Model example structure module and the second Extended Model
Example builds module, and the first Extended Model example, which builds module, to be used to build the first Extended Model, and the first Extended Model is centre
Extended Model example of the language to object language;Second Extended Model example, which builds module, to be used to build the second Extended Model, the
Two Extended Models are Extended Model example of the original language to object language.
The structure of first Extended Model and the second Extended Model is identical, refers to and is described above, and the second model instance builds mould
Block is used to the first foreign languages translation word list of model, the second foreign languages translation word list, weight and similarity threshold being instantiated as
Specific string value in third party's language.
3rd model instance builds module, for setting up target background semantic model example and linguistic context structural model example.
The structure of target background semantic model and linguistic context structural model, which is referred to, to be described above, and the 3rd model instance builds module
For by the target text notional word list in target background semantic model, target text notional word NotationalWordi, word
Property PropertyiWith word frequency WFrequencyiSpecific string value is instantiated as in third party's language, and for by target
The list of target text notional word, notional word structural array NotationalWordArrayList in linguistic context structural modeliExample
Turn to specific string value in third party's language.
Semantic goodness of fit evaluation and test module, for calculating target background semantic model example and target linguistic context structural model example
The semantic distance for expanding background semantic model example is arrived respectively, and third party's language target is evaluated and tested according to the semantic distance calculated
Whether the semanteme of text matches with the original language semanteme provided.
Specifically, semantic goodness of fit evaluation and test module includes:
Common factor computing module, for obtaining target background semantic model example and the first Extended Model example, and target
The common factor of background semantic model example and the second Extended Model example;
Background semantic correlation calculations module, it is real for obtaining target background semantic model example and the first Extended Model
Example, and target background semantic model example and the second Extended Model example background semantic correlation;
Sentence structure semantic dependency, for obtaining target linguistic context structural model example and the first Extended Model example, with
And the sentence structure semantic dependency of target linguistic context structural model example and the second Extended Model example;
Technique Using Both Text correlation evaluation and test module, for target background semantic model example/target linguistic context structural model example
Mould is expanded with the first Extended Model example, and target background semantic model example/target linguistic context structural model example and second
The Technique Using Both Text correlation of type example;Then judge that third party's language target text is according to the Technique Using Both Text correlation calculated
It is no to be coincide with current source background semantic model example semantic.
Preferably, common factor computing module, is intersected using threshold similarity and compares function TSCC (δ, α, A, B), with reference to intersection
Semantic extremal function cSMVF (A, B, α) is calculated.
Background semantic correlation calculations module, is calculated using the intersection semanteme extremal function cSMVF (A, B, α).
Sentence structure semantic dependency computing module, is entered using linguistic context structural model semantic valuation function cMSSF (A, B, α)
Row is calculated.
Technique Using Both Text correlation evaluation and test module, evaluates and tests function SSRF (A, B, C, D) using Technique Using Both Text correlation and is counted
Calculate.
Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable ginseng
Number.
These above-mentioned functions have a detailed description in evaluating method, just repeat no more.
The technology contents and technical characteristic of the present invention have revealed that as above, but those skilled in the art still may base
Make a variety of replacements and modification without departing substantially from spirit of the present invention, therefore, the scope of the present invention in teachings of the present invention and announcement
The content disclosed in embodiment should be not limited to, and various replacements and modification without departing substantially from the present invention should be included, and is this patent Shen
Please claim covered.