CN107193800A - A kind of semantic goodness of fit evaluating method and device towards third party's language text - Google Patents

A kind of semantic goodness of fit evaluating method and device towards third party's language text Download PDF

Info

Publication number
CN107193800A
CN107193800A CN201710353875.3A CN201710353875A CN107193800A CN 107193800 A CN107193800 A CN 107193800A CN 201710353875 A CN201710353875 A CN 201710353875A CN 107193800 A CN107193800 A CN 107193800A
Authority
CN
China
Prior art keywords
semantic
model example
model
target
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710353875.3A
Other languages
Chinese (zh)
Other versions
CN107193800B (en
Inventor
傅朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Heiyun Technology Co ltd
Suzhou Black Cloud Intelligent Technology Co ltd
Original Assignee
Suzhou Dark Clouds Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Dark Clouds Mdt Infotech Ltd filed Critical Suzhou Dark Clouds Mdt Infotech Ltd
Priority to CN201710353875.3A priority Critical patent/CN107193800B/en
Publication of CN107193800A publication Critical patent/CN107193800A/en
Application granted granted Critical
Publication of CN107193800B publication Critical patent/CN107193800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Present invention is disclosed a kind of semantic goodness of fit evaluating method and device towards third party's language text, methods described includes:S1, sets up source background semantic model example;S2, sets up the expansion background semantic model example corresponding with source background semantic model;S3, sets up target background semantic model example and target linguistic context structural model example;S4, calculate target background semantic model example and target linguistic context structural model example arrives the semantic distance for expanding background semantic model example respectively, whether the original language semanteme with being provided matches according to the semanteme of the semantic distance evaluation and test third party's language target text calculated.The semantic matches to the text of third party's language performance and lookup based on original language (such as Chinese) can be achieved in the present invention, expands semantic matches and the scope searched, the sources of knowledge have been expanded significantly.

Description

A kind of semantic goodness of fit evaluating method and device towards third party's language text
Technical field
The present invention relates to natural language processing, text semantic analysis technical field, more particularly, to one kind towards third party The semantic goodness of fit evaluating method and device of language text.
Background technology
In routine work and life, such demand is often run into:In the case where being ignorant of third country's language, think It is state how to understand same problem in the country of third party's language by network.At present, this problem can't be obtained To effective solution, essentially consist in, still lack effective from Chinese to the semantic matches scheme of third country's language.
The content of the invention
It is an object of the invention to overcome the defect of prior art there is provided one kind using Chinese as original language, towards third party The semantic goodness of fit evaluating method and device of language, based on the program, allow users to, based on Chinese, accurately find desired Third party's language description similar or similar relevant issues or people, thing, thing.
To achieve the above object, the present invention proposes following technical scheme:A kind of semanteme kiss towards third party's language text Right evaluating method, including:
S1, sets up source background semantic model example;
S2, sets up the expansion background semantic model example corresponding with source background semantic model;
S3, sets up target background semantic model example and linguistic context structural model example;
S4, calculates the target background semantic model example and target linguistic context structural model example to the expansion background language The semantic distance of adopted model instance, according to the semantic distance that calculates evaluate and test third party's language target text it is semantic whether with institute The original language semanteme of offer matches.
Preferably, the source background semantic model includes the application field and at least one first model attributes of model, often Individual first model attributes are with two tuple (KEYWORDi,WEIGHTi) represent, wherein, KEYWORDiFor original language keyword, WEIGHTiFor KEYWORDiThe weight occupied in whole source background semantic model, (1,2 ... n), i, and n is to be more than by i= Natural number equal to 1;The source background semantic model example procedure of setting up includes:By the application field of the model, original language Keyword and Weights Example turn to specific string value.
Preferably, setting up the expansion background semantic model example procedure includes setting up the first Extended Model example and second Extended Model example, wherein, first Extended Model is Extended Model of the original language through intermediate language to object language, described Second Extended Model is Extended Model of the original language to object language.
Preferably, first Extended Model and the second Extended Model structure are identical, are arranged including the first foreign languages translation word Table, and at least one second model attributes, the first foreign languages translation word list and the application field phase of source background semantic model Correspondence, each second model attributes triple (KEYWORDLISTi,WEIGHTi,αi) represent, wherein, KEYWORDLISTiFor with KEYWORDiThe second corresponding foreign languages translation word list, WEIGHTiFor KEYWORDiIn the whole source back of the body The weight occupied in scape semantic model, αiFor original language keyword to the similarity threshold between foreign languages translation word, i=(1, 2 ... n), i, and n is the natural number more than or equal to 1;The expansion background semantic model example procedure of setting up includes:Will be described The first foreign languages translation word list, the second foreign languages translation word list, weight and the semantic distance examples of threshold of model are turned to specifically String value.
Preferably, the second foreign languages translation word list KEYWORDLISTiInclude at least one cellular construction, each The cellular construction is with two tuple (KEYWORDi,Propertyi) represent, wherein, KEYWORDiFor with the background semantic model of source The corresponding intermediate language keyword or third party's language keywords of original language keyword, PropertyiFor KEYWORDiWord Property, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.
Preferably, the target background semantic model includes the target text notional word list arranged by word frequency descending, and At least one the 3rd model attributes, each 3rd model attributes triple (NotationalWordi,Propertyi, WFrequencyi) represent, wherein, NotationalWordiFor target text notional word, PropertyiFor NotationalWordiPart of speech, WFrequencyiThe word frequency for being NotationalWordi in target text, i=(1, 2 ... n), i, and n is the natural number more than or equal to 1;The target background semantic model example procedure of setting up includes:Will be described The target text notional word list of model, target text notional word NotationalWordi, part of speech PropertyiAnd word frequency WFrequencyiIt is instantiated as specific string value.
Preferably, the target linguistic context structural model includes the target text notional word list arranged by word frequency descending, and At least one the 4th model attributes, each 4th model attributes NotationalWordArrayListiRepresent, wherein, NotationalWordArrayListiThe notional word structural array of a complete clause in expression target text, i=(1, 2 ... n), i, and n is the natural number more than or equal to 1;The target linguistic context structural model example procedure of setting up includes:Will be described The target text notional word list of model, notional word structural array NotationalWordArrayListiIt is instantiated as third party Specific string value in language.
Preferably, the notional word structural array NotationalWordArrayListiIncluding multiple cellular constructions, often Individual cellular construction triple (NotationalWordi,Propertyi,WFrequencyi) represent, wherein, NotationalWordiFor target text notional word, PropertyiFor NotationalWordiPart of speech, WFrequencyiFor Word frequency of the NotationalWordi in target text, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.
Preferably, the S4 includes:
S401, obtains target background semantic model example and the first Extended Model example, and target background semantic model The common factor of example and the second Extended Model example, if two common factors are not sky, into S402;
S402, calculates target background semantic model example and the first Extended Model example, and target background semantic model The background semantic correlation of example and the second Extended Model example;
S403, calculates target linguistic context structural model example and the first Extended Model example, and target linguistic context structural model The sentence structure semantic dependency of example and the second Extended Model example;
S404, calculates target background semantic model example/target linguistic context structural model example and the first Extended Model example, And target background semantic model example/target linguistic context structural model example is related to the Technique Using Both Text of the second Extended Model example Property;Then according to the Technique Using Both Text correlation that calculates judge third party's language target text whether with current source background semantic Model instance semanteme coincide.
Preferably, in step S401, intersected using threshold similarity and compare function TSCC (δ, α, A, B), with reference to intersection language Adopted extremal function cSMVF (A, B, α) is calculated;
In step S402, calculated using the intersection semanteme extremal function cSMVF (A, B, α);
In step S403, calculated using linguistic context structural model semantic valuation function cMSSF (A, B, α);
In step S404, function SSRF (A, B, C, D) is evaluated and tested using Technique Using Both Text correlation and calculated;
Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable ginseng Number.
Present invention also offers another technical scheme:A kind of semantic goodness of fit towards third party's language text is evaluated and tested Device, including:
First model instance builds module, for setting up source background semantic model example, based on the mode of automatic guide, draws Lead the semantic feature that user creates the Chinese knowledge for wanting matching;
Second model instance builds module, for setting up the expansion background semantic model corresponding with source background semantic model Example;
3rd model instance builds module, for setting up target background semantic model example and linguistic context structural model example;
Semantic goodness of fit evaluation and test module, for calculating target background semantic model example and target linguistic context structural model example The semantic distance for expanding background semantic model example is arrived respectively, and third party's language target is evaluated and tested according to the semantic distance calculated Whether the semanteme of text matches with the original language semanteme provided.
Preferably, second model instance, which builds module, includes the first Extended Model example structure module and the second expansion Model instance builds module, and the first Extended Model example, which builds module, to be used to build the first Extended Model, and described first opens up Exhibition model is Extended Model example of the original language through intermediate language to object language;The second Extended Model example builds module For building the second Extended Model, second Extended Model is Extended Model example of the original language to object language.
Preferably, the semantic goodness of fit evaluation and test module includes:
Common factor computing module, for obtaining target background semantic model example and the first Extended Model example, and target The common factor of background semantic model example and the second Extended Model example;
Background semantic correlation calculations module, it is real for obtaining target background semantic model example and the first Extended Model Example, and target background semantic model example and the second Extended Model example background semantic correlation;
Sentence structure semantic dependency module, it is real for obtaining target linguistic context structural model example and the first Extended Model Example, and target linguistic context structural model example and the second Extended Model example sentence structure semantic dependency;
Technique Using Both Text correlation evaluation and test module, for calculating target background semantic model example/target linguistic context structural model Example and the first Extended Model example, and target background semantic model example/target linguistic context structural model example are opened up with second Open up the Technique Using Both Text correlation of model instance;Then third party's language target text is judged according to the Technique Using Both Text correlation calculated Whether this coincide with current source background semantic model example semantic.
Preferably, the common factor computing module, is intersected using threshold similarity and compares function TSCC (δ, α, A, B), with reference to Intersect semanteme extremal function cSMVF (A, B, α) to be calculated;
The background semantic correlation calculations module, is counted using the intersection semanteme extremal function cSMVF (A, B, α) Calculate;
The sentence structure semantic dependency computing module, using linguistic context structural model semantic valuation function cMSSF (A, B, α) calculated;
The Technique Using Both Text correlation evaluation and test module, evaluates and tests function SSRF (A, B, C, D) using Technique Using Both Text correlation and enters Row is calculated;
Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable ginseng Number.
Compared with prior art, the present invention can be achieved to carry out the text of third party's language performance with original language (such as Chinese) Semantic matches and lookup, expand semantic matches and the scope searched, the sources of knowledge have been expanded significantly.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of semantic goodness of fit evaluating method of the invention;
Fig. 2 is the structural representation of source background semantic model of the present invention;
Fig. 3 is the structural representation that the present invention expands background semantic model;
Fig. 4 is the structural representation of the second foreign languages translation word list of the invention;
Fig. 5 is the structural representation of target background semantic model of the present invention;
Fig. 6 is the structural representation of target linguistic context structural model of the present invention;
Fig. 7 is the structural representation of notional word structural array of the present invention;
Fig. 8 is step S4 of the present invention schematic flow sheet.
Embodiment
Below in conjunction with the accompanying drawing of the present invention, the technical scheme to the embodiment of the present invention carries out clear, complete description.
A kind of semantic goodness of fit evaluating method and device towards third party's language text disclosed in the embodiment of the present invention, Realize using Chinese as original language, semantic matches and lookup are carried out towards third party's language text, be certainly not limited to during original language is Text or other languages.
As shown in figure 1, a kind of semantic goodness of fit towards third party's language text disclosed in the embodiment of the present invention is evaluated and tested Method, comprises the following steps:
S1, sets up Chinese background semantic model example.
Specifically, in the embodiment of the present invention, Chinese background semantic model is defined as structure as shown in Figure 2, in a tabular form Represent, include in the form:First generic and at least one first model attributes, the first generic are represented with variable X XX, are represented Application field belonging to whole Chinese background semantic model;Each first model attributes are with two tuple (KEYWORDi,WEIGHTi) Represent, wherein, KEYWORDiFor Chinese key, WEIGHTiFor KEYWORDiShared by whole Chinese background semantic model Weight.
The process for setting up Chinese background semantic model example is by the generic XXX in model, original language keyword KEYWORDi, weight WEIGHTiIt is instantiated as specific string value.
S2, sets up foreign language corresponding with Chinese background semantic model and expands background semantic model example.
Specifically, setting up foreign language expansion background semantic model example includes two processes, that is, sets up the first Extended Model real Example and the second Extended Model example, the first Extended Model example are established as original language and arrive third party's language through intermediate language (such as English) The Extended Model example (en2rdEMI) of speech, another is exactly direct Chinese of setting up to the Extended Model example of third party's language (rdEMI)。
In embodiments of the present invention, the first Extended Model and the second Extended Model are defined as tying as shown in Figure 3 and Figure 4 Structure, equally in a tabular form but is not limited to form, includes in form shown in Fig. 3:Second generic and at least one second Model attributes, the second generic is represented with variable X XX-List, represents the first foreign languages translation word list, the first foreign languages translation word list It is corresponding with the application field (XXX) belonging to Chinese background semantic model, each second model attributes triple (KEYWORDLISTi,WEIGHTi,αi) represent, wherein, KEYWORDLISTiIt is corresponding for above-mentioned Chinese key KEYWORDi Second foreign languages translation word list, and translation word and Chinese key KEYWORD in tableiIt is synonymous, WEIGHTiFor KEYWORDi Shared weight, α in whole Chinese background semantic modeliFor the second foreign languages translation word list KEYWORDLISTiIn corresponding Literary keyword KEYWORDiBetween similarity threshold, typically, the second foreign languages translation word list KEYWORDLISTiIt is crucial with Chinese Word KEYWORDiBetween semantic distance be less than αi, and semantic distance between the two is smaller, illustrate both it is semantic closer to.
Preferably, the second foreign languages translation word list KEYWORDLISTiIt is defined as Fig. 4 such as and shows structure, equally in a tabular form But it is not limited to form to represent, includes at least one cellular construction, two tuples of each cellular construction in the form (KEYWORDi,Propertyi) represent, wherein, KEYWORDi is corresponding with the Chinese key in Chinese background semantic model English keyword or third party's object language keyword, Propertyi be KEYWORDi part of speech.
Set up foreign language and expand the process of background semantic model example (i.e. including setting up Chinese through English to third party's language Extended Model example and the Chinese Extended Model example procedure to third party's language of foundation) be specially:By outside first in model Text translation word list XXX-List, the second foreign languages translation word list KEYWORDLISTi, weight WEIGHTi and similarity threshold αi Specific string value is instantiated as in third party's language, en2rdEMI and rdEMI is obtained.
S3, sets up the background semantic model example (rdBMI) and linguistic context structural model example of third party's language target text (rdsCMI)。
Specifically, in the embodiment of the present invention, the background semantic model of third party's language target text is defined as shown in Figure 5 Structure, is equally represented in a tabular form, and form shown in Fig. 5 includes the notional word list and at least of third party's language target text One the 3rd model attributes, target text notional word list is also represented with variable X XX-List;Each attribute triple (NotationalWordi,Propertyi,WFrequencyi) represent, wherein, NotationalWordiFor third party's language mesh Mark the notional word in text, PropertyiFor the notional word NotationalWord in third party's language target textiPart of speech, WFrequencyiFor notional word NotationalWordiWord frequency in third party's language target text.Target text notional word Target text notional word NotationalWord in listiIt is preferred that by word frequency WFrequencyiDescending is arranged.
The process for setting up the background semantic model example of third party's language target text is by the target text in model Notional word NotationalWord in notional word list XXX-List, target texti, part of speech Propertyi, word frequency WFrequencyiSpecific string value is instantiated as in third party's language, rdBMI is obtained.
In the embodiment of the present invention, the linguistic context structural model of third party's language target text is defined as tying as shown in Figure 6 and Figure 7 Structure, is also to represent in a tabular form, includes the notional word list of third party's language target text in form shown in Fig. 6, and extremely Few 4th model attributes, target text notional word list is equally represented with variable X XX-List;Each 4th model attributes Use triple NotationalWordArrayListiRepresent, NotationalWordArrayListiRepresent one in target text The notional word structural array of individual complete clause.
Preferably, as shown in fig. 7, in target text a complete clause notional word structural array NotationalWordArrayListiIncluding multiple cellular constructions, each cellular construction triple (NotationalWordi, Propertyi,WFrequencyi) represent, wherein, NotationalWordiFor the notional word in third party's language target text, PropertyiFor the notional word NotationalWord in third party's language target textiPart of speech, WFrequencyiFor real justice Word NotationalWordiWord frequency in third party's language target text.Target text in target text notional word list Notional word NotationalWordiIt is preferred that by word frequency WFrequencyiDescending is arranged.
The process for setting up the linguistic context structural model example of third party's language target text is to arrange the notional word in model Table X XX-List and notional word structural array NotationalWordArrayListiIt is instantiated as in third party's language specific String value, obtains rdsCMI.
S4, calculates the background semantic model example (rdBMI) and target linguistic context structural model of third party's language target text Example (rdsCMI) arrives the semantic distance that foreign language expands background semantic model example (rdEMI and en2rdEMI) respectively, and with this Evaluation and test third party's language text it is semantic whether with the Chinese that is provided is semantic coincide.
Specifically, as shown in figure 8, including following steps:
S401, the background semantic model example (rdBMI) and the second Extended Model for obtaining third party's language target text is real Example (rdEMI), and third party's language target text background semantic model example (rdBMI) and the first Extended Model example (en2rdEMI) common factor.
Specifically, met in intersection semanteme extremal function cSMVF (A, B, α) under the basis of certain condition, use threshold value phase Intersect like degree and compare function TSCC (δ, α, A, B) to obtain rdBMI and rdEMI respectively, and rdBMI and en2rdEMI friendship Collection, the common factor for defining the rdBMI calculated and rdEMI is Φ1, rdBMI and rdEMI common factor are Φ2, i.e.,:
TSCC (δ, α, rdBMI, rdEMI)=Φ1
TSCC (δ, α, rdBMI, en2rdEMI)=Φ2
If Φ1For empty or Φ2For sky, that is to say, that bright current third party language target text and the current Chinese back of the body Scape semantic model example semantic is misfitted, then stops calculating;Otherwise, next step step S402 is continued.
Wherein, threshold similarity intersects comparison function TSCC (δ, α, A, B) and is defined as:
IfCSMVF (A, B, α) >=α is made, and | A ∩ B | >=δ;Wherein, δ is set A, B friendship cardinality, and α is phase Like degree threshold value, δ, α is adjustable parameter.
Intersect semanteme extremal function cSMVF (A, B, α) to be defined as:
ak∈ A, | A | it is A gesture;
And 1≤m, i≤| B |, There are SemanticDisVal (ak,bi)≥SemanticDisVal(ak,bm) and SemanticDisVal (ak,bi) >=α and bi∈B And bm∈B。
SemanticDisVal(ai, bi) be defined as:Keyword ai, biAccording to semantic distance meter in keyword ontology library Calculate gained semantic similarity numerical value.
S402, the background semantic model example (rdBMI) and the second Extended Model for calculating third party's language target text is real Example (rdEMI), and third party's language target text background semantic model example (rdBMI) and the first Extended Model example (en2rdEMI) background semantic correlation.
Specifically, rdBMI and rdEMI, and rdBMI are calculated using above-mentioned intersection semanteme extremal function cSMVF (A, B, α) With en2rdEMI background semantic correlation, intersecting semanteme extremal function cSMVF (A, B, α) definition can retouch referring in particular to above-mentioned State, repeat no more here.
S403, the linguistic context structural model example (rdsCMI) and the second Extended Model for calculating third party's language target text is real Example (rdEMI), and third party's language target text linguistic context structural model example (rdsCMI) and the first Extended Model example (en2rdEMI) sentence structure semantic dependency.
Specifically, rdsCMI and rdEMI are calculated using linguistic context structural model semantic valuation function cMSSF (A, B, α), RdsCMI and en2rdEMI sentence structure semantic dependency.
Linguistic context structural model semantic valuation function cMSSF (A, B, α) is defined as:
In function, n is notional word structural array NotationalWordArrayList under attribute in rdsCMI model structuresi Number, SSSF is with reference to MODEL2The function of single statement structure semanticses is calculated, δ is mediation parameter, SentenceNumber_ Rd is the actual contained single statement number in third party's network text;
Wherein, n is Array_keywordsiThe length of array, θ is MODEL2In with NwiThat most like, with outside Text expands institute's Weight (WEIGHT i.e. in Fig. 3 cellular constructions in background semantic modeli), NwiFor Array_keywordsiIn Element in notional word;ε:If NwiIt is identical with the part of speech of most like word, value ρ, otherwise, value σ, wherein ρ and σ are It is mediation parameter;
S403, calculates background semantic model example (rdBMI)/linguistic context structural model example of third party's language target text (rdsCMI) with the second Extended Model example (rdEMI), and third party's language target text background semantic model example (rdBMI) the Technique Using Both Text correlation of/linguistic context structural model example (rdsCMI) and the first Extended Model example (en2rdEMI), Then according to the Technique Using Both Text correlation that calculates judge third party's language target text whether with current Chinese background semantic Model instance semanteme coincide.
Specifically, evaluate and test function SSRF (A, B) to calculate rdBMI/rdsCMI and rdEMI using Technique Using Both Text correlation, And rdBMI/rdsCMI and en2rdEMI Technique Using Both Text correlation.
Technique Using Both Text correlation evaluation and test function SSRF (A, B, C, D) is defined as:
If SSSRF (rdsCMI, rdBMI, rdEMI, enEMI) >=ω, then it is assumed that the semanteme of third party's language target text Matched with the semanteme of current Chinese background semantic model example, otherwise it is assumed that the two is misfitted, wherein, ω is adjustable ginseng Number.
It should be noted that above-mentioned i=1,2 ... n, wherein, i, n is the natural number more than or equal to 1.
A kind of semantic goodness of fit evaluating apparatus towards third party's language text provided by the present invention, including:
First model instance builds module, for setting up source background semantic model example, based on the mode of automatic guide, draws Lead the semantic feature that user creates the Chinese knowledge for wanting matching.
Specifically, the Chinese background semantic model structure of the first model instance structure module construction is shown in above-mentioned specific descriptions, First model instance, which builds module, to be used to the application field of model, original language keyword and Weights Example turning to specific character String value.
Second model instance builds module, for setting up the expansion background semantic model corresponding with source background semantic model Example.
Specifically, the second model instance, which builds module, includes the first Extended Model example structure module and the second Extended Model Example builds module, and the first Extended Model example, which builds module, to be used to build the first Extended Model, and the first Extended Model is centre Extended Model example of the language to object language;Second Extended Model example, which builds module, to be used to build the second Extended Model, the Two Extended Models are Extended Model example of the original language to object language.
The structure of first Extended Model and the second Extended Model is identical, refers to and is described above, and the second model instance builds mould Block is used to the first foreign languages translation word list of model, the second foreign languages translation word list, weight and similarity threshold being instantiated as Specific string value in third party's language.
3rd model instance builds module, for setting up target background semantic model example and linguistic context structural model example.
The structure of target background semantic model and linguistic context structural model, which is referred to, to be described above, and the 3rd model instance builds module For by the target text notional word list in target background semantic model, target text notional word NotationalWordi, word Property PropertyiWith word frequency WFrequencyiSpecific string value is instantiated as in third party's language, and for by target The list of target text notional word, notional word structural array NotationalWordArrayList in linguistic context structural modeliExample Turn to specific string value in third party's language.
Semantic goodness of fit evaluation and test module, for calculating target background semantic model example and target linguistic context structural model example The semantic distance for expanding background semantic model example is arrived respectively, and third party's language target is evaluated and tested according to the semantic distance calculated Whether the semanteme of text matches with the original language semanteme provided.
Specifically, semantic goodness of fit evaluation and test module includes:
Common factor computing module, for obtaining target background semantic model example and the first Extended Model example, and target The common factor of background semantic model example and the second Extended Model example;
Background semantic correlation calculations module, it is real for obtaining target background semantic model example and the first Extended Model Example, and target background semantic model example and the second Extended Model example background semantic correlation;
Sentence structure semantic dependency, for obtaining target linguistic context structural model example and the first Extended Model example, with And the sentence structure semantic dependency of target linguistic context structural model example and the second Extended Model example;
Technique Using Both Text correlation evaluation and test module, for target background semantic model example/target linguistic context structural model example Mould is expanded with the first Extended Model example, and target background semantic model example/target linguistic context structural model example and second The Technique Using Both Text correlation of type example;Then judge that third party's language target text is according to the Technique Using Both Text correlation calculated It is no to be coincide with current source background semantic model example semantic.
Preferably, common factor computing module, is intersected using threshold similarity and compares function TSCC (δ, α, A, B), with reference to intersection Semantic extremal function cSMVF (A, B, α) is calculated.
Background semantic correlation calculations module, is calculated using the intersection semanteme extremal function cSMVF (A, B, α).
Sentence structure semantic dependency computing module, is entered using linguistic context structural model semantic valuation function cMSSF (A, B, α) Row is calculated.
Technique Using Both Text correlation evaluation and test module, evaluates and tests function SSRF (A, B, C, D) using Technique Using Both Text correlation and is counted Calculate.
Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable ginseng Number.
These above-mentioned functions have a detailed description in evaluating method, just repeat no more.
The technology contents and technical characteristic of the present invention have revealed that as above, but those skilled in the art still may base Make a variety of replacements and modification without departing substantially from spirit of the present invention, therefore, the scope of the present invention in teachings of the present invention and announcement The content disclosed in embodiment should be not limited to, and various replacements and modification without departing substantially from the present invention should be included, and is this patent Shen Please claim covered.

Claims (14)

1. a kind of semantic goodness of fit evaluating method, it is characterised in that including:
S1, sets up source background semantic model example;
S2, sets up the expansion background semantic model example corresponding with source background semantic model;
S3, sets up target background semantic model example and target linguistic context structural model example;
S4, calculates the target background semantic model example and target linguistic context structural model example arrives the expansion background language respectively The semantic distance of adopted model instance, according to the semantic distance that calculates evaluate and test third party's language target text it is semantic whether with institute The original language semanteme of offer matches.
2. according to the method described in claim 1, it is characterised in that the source background semantic model includes the application field of model With at least one the first model attributes, each first model attributes are with two tuple (KEYWORDi,WEIGHTi) represent, its In, KEYWORDiFor original language keyword, WEIGHTiFor KEYWORDiThe weight occupied in whole source background semantic model, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=;The source background semantic model example procedure of setting up includes:Will Application field, original language keyword and the Weights Example of the model turn to specific string value.
3. method according to claim 2, it is characterised in that setting up the expansion background semantic model example procedure includes The first Extended Model example and the second Extended Model example are set up, wherein, first Extended Model is original language through middle language The Extended Model of object language is sayed, second Extended Model is Extended Model of the original language to object language.
4. method according to claim 3, it is characterised in that first Extended Model and the second Extended Model structure phase Together, including the first foreign languages translation word list, and at least one second model attributes, the first foreign languages translation word list and source The application field of background semantic model is corresponding, each second model attributes triple (KEYWORDLISTi, WEIGHTii) represent, wherein, KEYWORDLISTiFor with KEYWORDiThe second corresponding foreign languages translation word list, WEIGHTi For KEYWORDiThe weight occupied in whole source background semantic model, αiIt is original language keyword between foreign languages translation word Similarity threshold, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=;Described set up expands background semantic model Example procedure includes:By the first foreign languages translation word list of the model, the second foreign languages translation word list, weight and similarity threshold Value is instantiated as in third party's language specific string value.
5. method according to claim 4, it is characterised in that the second foreign languages translation word list KEYWORDLISTiIn Including at least one cellular construction, each cellular construction is with two tuple (KEYWORDi,Propertyi) represent, wherein, KEYWORDiFor the original language keyword in the background semantic model of source corresponding intermediate language keyword or third party's language barrier Keyword, PropertyiFor KEYWORDiPart of speech, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.
6. according to the method described in claim 1, it is characterised in that the target background semantic model is included by word frequency descending row The target text notional word list of row, and at least one the 3rd model attributes, each 3rd model attributes triple (NotationalWordi,Propertyi,WFrequencyi) represent, wherein, NotationalWordiFor the real justice of target text Word, PropertyiFor NotationalWordiPart of speech, WFrequencyiIt is NotationalWordi in target text Word frequency, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=;It is described to set up target background semantic model example procedure Including:By the target text notional word list of the model, target text notional word NotationalWordi, part of speech PropertyiWith word frequency WFrequencyiSpecific string value is instantiated as in third party's language.
7. according to the method described in claim 1, it is characterised in that the target linguistic context structural model is included by word frequency descending row The target text notional word list of row, and at least one the 4th model attributes, each 4th model attributes are used NotationalWordArrayListiRepresent, wherein, NotationalWordArrayListiRepresent target text in one it is complete The notional word structural array of whole clause, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=;It is described to set up target language Border structural model example procedure includes:By the target text notional word list of the model, notional word structural array NotationalWordArrayListiSpecific string value is instantiated as in third party's language.
8. method according to claim 7, it is characterised in that the notional word structural array NotationalWordArrayListiIncluding multiple cellular constructions, each cellular construction triple (NotationalWordi,Propertyi,WFrequencyi) represent, wherein, NotationalWordiFor the real justice of target text Word, PropertyiFor NotationalWordiPart of speech, WFrequencyiIt is NotationalWordi in target text Word frequency, (1,2 ... n), i, and n is the natural number more than or equal to 1 by i=.
9. method according to claim 4, it is characterised in that the S4 includes:
S401, obtains target background semantic model example and the first Extended Model example, and target background semantic model example With the common factor of the second Extended Model example, if two occur simultaneously not for sky, into S402;
S402, calculates target background semantic model example and the first Extended Model example, and target background semantic model example With the background semantic correlation of the second Extended Model example;
S403, calculates target linguistic context structural model example and the first Extended Model example, and target linguistic context structural model example With the sentence structure semantic dependency of the second Extended Model example;
S404, calculates target background semantic model example/target linguistic context structural model example and the first Extended Model example, and The Technique Using Both Text correlation of target background semantic model example/target linguistic context structural model example and the second Extended Model example; Then according to the Technique Using Both Text correlation that calculates judge third party's language target text whether with current source background semantic mould Type example semantic coincide.
10. method according to claim 9, it is characterised in that
In step S401, intersected using threshold similarity and compare function TSCC (δ, α, A, B), with reference to the semantic extremal function of intersection CSMVF (A, B, α) is calculated;
In step S402, calculated using the intersection semanteme extremal function cSMVF (A, B, α);
In step S403, calculated using linguistic context structural model semantic valuation function cMSSF (A, B, α);
In step S404, function SSRF (A, B, C, D) is evaluated and tested using Technique Using Both Text correlation and calculated;
Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable parameter.
11. a kind of semantic goodness of fit evaluating apparatus towards third party's language text, it is characterised in that including:
First model instance builds module, for setting up source background semantic model example, and based on the mode of automatic guide, guiding is used Family creates the semantic feature for the Chinese knowledge for wanting matching;
Second model instance builds module, real for setting up the expansion background semantic model corresponding with source background semantic model Example;
3rd model instance builds module, for setting up target background semantic model example and linguistic context structural model example;
Semantic goodness of fit evaluation and test module, for calculating target background semantic model example and target linguistic context structural model example difference Third party's language target text is evaluated and tested to the semantic distance for expanding background semantic model example, and according to the semantic distance calculated Semanteme whether matched with the original language semanteme provided.
12. device according to claim 11, it is characterised in that second model instance builds module and opened up including first Open up model instance and build module and the second Extended Model example structure module, the first Extended Model example, which builds module, to be used for The first Extended Model is built, first Extended Model is Extended Model example of the original language through intermediate language to object language; The second Extended Model example, which builds module, to be used to build the second Extended Model, and second Extended Model is original language to mesh The Extended Model example of poster speech.
13. device according to claim 12, it is characterised in that the semantic goodness of fit evaluation and test module includes:
Common factor computing module, for obtaining target background semantic model example and the first Extended Model example, and target background The common factor of semantic model example and the second Extended Model example;
Background semantic correlation calculations module, for obtaining target background semantic model example and the first Extended Model example, with And the background semantic correlation of target background semantic model example and the second Extended Model example;
Sentence structure semantic dependency, for obtaining target linguistic context structural model example and the first Extended Model example, and mesh The sentence structure semantic dependency of poster border structural model example and the second Extended Model example;
Technique Using Both Text correlation evaluation and test module, for calculating target background semantic model example/target linguistic context structural model example Mould is expanded with the first Extended Model example, and target background semantic model example/target linguistic context structural model example and second The Technique Using Both Text correlation of type example;Then judge that third party's language target text is according to the Technique Using Both Text correlation calculated It is no to be coincide with current source background semantic model example semantic.
14. device according to claim 13, it is characterised in that
The common factor computing module, is intersected using threshold similarity and compares function TSCC (δ, α, A, B), with reference to the semantic extreme value of intersection Function cSMVF (A, B, α) is calculated;
The background semantic correlation calculations module, is calculated using the intersection semanteme extremal function cSMVF (A, B, α);
The sentence structure semantic dependency module, is counted using linguistic context structural model semantic valuation function cMSSF (A, B, α) Calculate;
The Technique Using Both Text correlation evaluation and test module, evaluates and tests function SSRF (A, B, C, D) using Technique Using Both Text correlation and is counted Calculate;
Wherein, A, B, C, D are set variable, and δ is set A, B friendship cardinality, and α is similarity threshold, and δ, α is adjustable parameter.
CN201710353875.3A 2017-05-18 2017-05-18 Semantic fitness evaluation method and device for third-party language text Active CN107193800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710353875.3A CN107193800B (en) 2017-05-18 2017-05-18 Semantic fitness evaluation method and device for third-party language text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710353875.3A CN107193800B (en) 2017-05-18 2017-05-18 Semantic fitness evaluation method and device for third-party language text

Publications (2)

Publication Number Publication Date
CN107193800A true CN107193800A (en) 2017-09-22
CN107193800B CN107193800B (en) 2023-09-01

Family

ID=59874196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710353875.3A Active CN107193800B (en) 2017-05-18 2017-05-18 Semantic fitness evaluation method and device for third-party language text

Country Status (1)

Country Link
CN (1) CN107193800B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910334A (en) * 2018-09-15 2020-03-24 北京市商汤科技开发有限公司 Instance segmentation method, image processing device and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855263A (en) * 2011-06-30 2013-01-02 富士通株式会社 Method and device for aligning sentences in bilingual corpus
CN104391842A (en) * 2014-12-18 2015-03-04 苏州大学 Translation model establishing method and system
CN106202068A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 The machine translation method of semantic vector based on multi-lingual parallel corpora

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102855263A (en) * 2011-06-30 2013-01-02 富士通株式会社 Method and device for aligning sentences in bilingual corpus
CN104391842A (en) * 2014-12-18 2015-03-04 苏州大学 Translation model establishing method and system
CN106202068A (en) * 2016-07-25 2016-12-07 哈尔滨工业大学 The machine translation method of semantic vector based on multi-lingual parallel corpora

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910334A (en) * 2018-09-15 2020-03-24 北京市商汤科技开发有限公司 Instance segmentation method, image processing device and computer readable storage medium
CN110910334B (en) * 2018-09-15 2023-03-21 北京市商汤科技开发有限公司 Instance segmentation method, image processing device and computer readable storage medium

Also Published As

Publication number Publication date
CN107193800B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
CN108647205B (en) Fine-grained emotion analysis model construction method and device and readable storage medium
Setiaji et al. Chatbot using a knowledge in database: human-to-machine conversation modeling
Sidorov et al. Empirical study of machine learning based approach for opinion mining in tweets
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN104915340B (en) Natural language question-answering method and device
CN106202059B (en) Machine translation method and machine translation device
CN104008092B (en) Method and system of relation characterizing, clustering and identifying based on the semanteme of semantic space mapping
CN106815252A (en) A kind of searching method and equipment
CN111310440B (en) Text error correction method, device and system
CN109902307A (en) Name the training method and device of entity recognition method, Named Entity Extraction Model
CN110147421B (en) Target entity linking method, device, equipment and storage medium
CN110162681B (en) Text recognition method, text processing method, text recognition device, text processing device, computer equipment and storage medium
CN109165386A (en) A kind of Chinese empty anaphora resolution method and system
CN104699767B (en) A kind of extensive Ontology Mapping Method towards Chinese language
CN110781673B (en) Document acceptance method and device, computer equipment and storage medium
US20210326525A1 (en) Device and method for correcting context sensitive spelling error using masked language model
Nguyen et al. Ontology-based integration of knowledge base for building an intelligent searching chatbot.
Gupta et al. Automatic text summarization system for Punjabi language
Gómez-Adorno et al. A graph based authorship identification approach
CN109766447A (en) A kind of method and apparatus of determining sensitive information
CN108038106A (en) A kind of fine granularity field term self-learning method based on context semanteme
CN113705198B (en) Scene graph generation method and device, electronic equipment and storage medium
CN114896382A (en) Artificial intelligent question-answering model generation method, question-answering method, device and storage medium
CN105243053A (en) Method and apparatus for extracting key sentence of document
CN107193800A (en) A kind of semantic goodness of fit evaluating method and device towards third party's language text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB03 Change of inventor or designer information

Inventor after: Fu Chaoyang

Inventor after: Xie Shaowen

Inventor before: Fu Chaoyang

CB03 Change of inventor or designer information
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200928

Address after: No. 2458, Dongda Road, Datuan Town, Pudong New Area, Shanghai, 201312

Applicant after: Shanghai nattl technology partnership (limited partnership)

Address before: Room 16 & 17, 3 / F, building 21, Tengfei science and Technology Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu Province

Applicant before: SUZHOU HEIYUN INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201228

Address after: Room 228, building 2, No. 378, East Ring Road, Suzhou Industrial Park, Jiangsu Province

Applicant after: Suzhou black cloud Intelligent Technology Co.,Ltd.

Applicant after: Shanghai heiyun Technology Co.,Ltd.

Address before: No.2458, Dongda highway, Datun Town, Pudong New Area, Shanghai 201312

Applicant before: Shanghai nattl technology partnership (limited partnership)

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant