Embodiment
The specific implementation process that the present invention is based on the natural language knowledge acquisition method that semantic matches drives is as follows:
Step 1:
(1a) the semantic matches relation between definition vocabulary:
Definition 1: in the lexical semantic knowledge base, any two notional word W
XAnd W
YBetween the inherent semantic relation that has, be called the semantic matches relation.With function match (W
X, W
Y) represent that its level of intimate, the value of function are exactly the semantic matches value.The semantic matches relation is irrelevant with concrete statement.If W
XWith W
YBetween do not have semantic matches relation, match (W then is set
X, W
Y)=MAX, MAX are large constants.
(1b) definition 2: any notional word W in the statement
i(removing predicate head) all semanteme modifies in another one notional word W
Gi, claim W
GiW
iThe semantic target of modifying.
(1c) definition 3: in the situation of specific grammatical analysis option A i, suppose that V is predicate head, S is the actor of V, and O is the receptor that holds of V, Wi be in the statement notional word and! (W
i∈ S, V, O}), W
GiW
iSemanteme modify target, with function match (W
i, W
Gi) represent its semantic matches value, so, the semantic matches value Value of whole statement
AiCan represent with formula (1):
Target modified in the semanteme of S and O is V, and n is the number (not comprising S, V, O) of notional word, K
SVOAnd K
WiBe the weights coefficient.The less expression semantic matches of the value of it should be noted that degree is larger.
(1d) hypothesis axiom 1 (best grammatical analysis axiom): suppose that a statement has m kind grammatical analysis scheme, meets the grammatical analysis option A of semantic logic most
iSatisfy condition: A
i=argmin (Value
Ai), namely the grammatical analysis scheme of semantic matches value minimum is best grammatical analysis scheme.
Step 2:
(2a) definition 4 (key concepts): that set before making up semantic base, that need not to carry out semantical definition and carry out the limited assemble of symbol that special semantic interpretation is processed by system during in Sentence analysis and reasoning, key concept quantity is few, uses C
WB={ W
B1, W
B2W
BkExpression, give tacit consent to each key concept W
BiInherit in the root concept.
(2b) definition 5 (lattice): the special key concept set of the Deep Semantics relation of correlation circumstance between expression action genus and other things, use C
VC={ C
1, C
2C
mExpression.For example " reason " is lattice of an expression action reason.
(2c) definition 6 (mode): the special key concept set of the executing state of expression action genus, use C
VM={ M
1, M
2M
nExpression.For example " finishing " is the mode that the expression action has been finished.
(2d) definition 7 (basic semantic relations): that before making up semantic base, set, unmodifiable and need to carry out the semantic relation that special semantic interpretation is processed during at Sentence analysis, comprise following semantic relation:
R
C(V, C, W) lattice semantic relation: have the Deep Semantics relation that lattice are C, wherein C ∈ C in order to expression action genus V and concept W
VC
R
M(V, M) modal semantic relation: the executing state in order to expression action genus V is M, wherein C ∈ C
VM
R
AP(W, P) attribute semantic relation: be the semantic relation of the attribute of noun genus W in order to represent concept P;
R
AS(W, S) state semantic relation: be the semantic relation of the state of noun genus W in order to represent concept S;
R
D(W) inherit semantic relation: can only use once in the semantical definition formula of concept, for example W
1Inherit in W
2
R
P(W
1, W
2) finite aggregate that part of semantic concerns is described: concept W is described
2Semanteme be W
1The part of semanteme;
R
W(W
1, W
2) finite aggregate of whole relation is described: concept W is described
1Semantic subsume W
2
R
VSThe actor of (V, W) expression action V is the semantic relation of W, R
VO(V, W) represents the semantic relation that the receptor is W that holds of action V.
(2e) definition 8 (expansion semantic relations): the semantic relation of the new definition of institute is gathered in making up the semantic base process, and quantity is not limit.When Sentence analysis, all expansion semantic relations have unified disposal route, do not carry out special processing.Use R
R(W
1, W
2) expression, illustrate that the concept W that is defined is W
1And W
2Between a kind of semantic relation.In non-definition, use R
R(W, W
1, W
2) expression W
1And W
2Between have the expansion semantic relation of W by name.
(2f) based on basic description logic, its concept definition is limited and conversion by regular 1-7, become the Concept Semantic define method of this paper.
Rule 1 (concept definition rule):
1) if W
B1, W
B2Key concept, R
1, R
2Semantic relation, new symbol then
It is concept;
2) if W
B1, W
B2Concept, R
1, R
2Semantic relation, new symbol then
It is concept;
Rule do not have in 1 concept also, concept hands over, the service regeulations of measure word, their processing mode is seen regular 2-2.Rule 2-3 is for the definition of conception mode being converted into one group of semantic relation, and requires noun to satisfy the principle of single succession.
Rule 2 (concept processing rule also): if W
1, W
2Concept, W
1, W
2The most recent co mmon ancestor concept be W
P, W has W=W when new ideas
1∪ W
2Semantic the time because single inheritance rules, W can be defined as W=R
D(W
P) ∩ (R
P(W
P, W
1) ∪ R
P(W
P, W
2)).For example, parents=R
D(people) ∩ (R
P(people, father) ∪ R
P(people, mother)).
Rule 3 (processing rule that concept is handed over): if W
1, W
2Be concept, W has W=W when new ideas
1∩ W
2Semantic the time because single inheritance rules, W can be defined as W=R
D(W
1) ∩ R
W(W
1, W
2) or W=R
D(W
2) ∩ R
W(W
2, W
1).
Rule 2 and rule 3 only with concept also, concept hands over a kind of expression mode of having changed, the certain semantic explanation that only need stipulate these two kinds expression modes gets final product, do like this is in order to solve a kind of flexible means of many succession issues and bonding inheritance principles, effect is similar to the interface among the java, with the retrieval of quickening concept and the speed of coupling.
In natural language, noun all can semantically directly or indirectly inherited in key concept, for the more clear semanteme that represents accurately noun, adopts single principle of inheriting when requiring semantic nouns to represent, for having the semantic concepts of inheriting, adopt rule 2 and rule 3 to process more.Transitive verb is representing that semantically noun does an action that applies to another noun; Intransitive verb then represents a kind of variation of noun self; Adjective is at the state or the attribute that semantically all represent between noun or noun; Adverbial word is at semantically all implementation status (mode) and the correlation circumstance (lattice) of expression action.The semanteme that therefore, can represent with the mode that satisfies rule 4 all kinds of vocabulary in the natural language.
Rule 4 (concept classification definition rules): concept is classified by character and is expressed as noun, verb, adjective, adverbial word in the natural language.Suppose that Num (R, W) is the occurrence number of semantic relation R in the definition with the definition of Def (W) expression concept W, every class vocabulary definition should be satisfied following rule:
The single of noun inherits: satisfy condition
Concept W.
Verb: satisfy condition
Adjective: satisfy condition
Adverbial word: satisfy condition
Rule 5 (processing rules of measure word): measure word (
Generality quantifier and existential quantifier
) do not obtain special treating, be used as " number of times " number of times of value representation action of lattice semantic relation, the number of noun represented as the value of " quantity " attribute semantic relation.Because this does not affect the syntactic structure analysis of natural language, when reasoning, make a concrete analysis of according to " number of times " lattice of action and the value of noun " quantity " attribute.
Rule 6 (example arranges rule): when defined notion W, if the concept W in the definition
iOccur m time, and the n that occurs referring to for this m time semantic { S
1, S
2S
n, then available { W, W#1 ... W#n-1}, n of distinguishing W is semantic, and W#i may be interpreted as example when reasoning.
Rule 7 (polysemant is disposed rule): a lot of polysemants are arranged, if polysemant W has n semantic { S in the natural language
1, S
2S
n, then for concept of each concrete semantical definition, define altogether n concept { W@1, W@2 ... it is different semantic that W@n} distinguishes this n of expression.
(2g) hypothesis axiom 2: inherit semantic relation and have unidirectional delivery, lower floor's concept is inherited semantic relation that Upper Concept has.
Theorem 1.According to inheritance R
D, all nouns consist of one tree.
According to regular 1-7 and definition 4-8, can define the semanteme of vocabulary in the natural language.Suppose to represent semantic relation with a directed line segment that according to the definition of theorem 1 and vocabulary and rule as can be known, the semanteme of vocabulary W can be represented by one group of directed line segment in the noun tree.
Step 3:
(3a) the semantic matches relation between definition noun-noun:
9 (conjunctive word compiles) of definition: the set of all vocabulary that comprise in the nominal definition formula, use C
RWExpression.For example the conjunctive word of the noun W among accompanying drawing Fig. 1 compiles
C
RW={W,W
P,W
r1,W
r2,W
r3,W
r4,W
r5,W
v,W
vc}
In the analysis hereinafter with ∝ for representing to inherit semantic relation, W
X∝ W
YExpression W
XInherit in W
Y, and regulation W ∝ W.
(1) basic semantic matching relationship
Definition 10 (direct semantics matching relationships): if vocabulary W
X, W
YSatisfy following condition, use symbol
Expression:
Condition: suppose W
YConjunctive word to compile be C
WY, then
When
The time, match (W
X, W
Y)=K
T* d (W
X, W
Z).
K
TBe the matching relationship coefficient, the type that concerns R according to mating is set to different constants, and 1≤K is generally arranged
T≤ 3.
For example: { W among accompanying drawing Fig. 1
Dr1, W
Dvc, W
Dr2, W
D2, W
Dr3, W
Dr4, W
Dr5In each vocabulary have the direct semantics matching relationship with W.
Definition 11 (inheriting the semantic matches relation): if vocabulary W
X, W
YSatisfy following condition, use symbol
Expression:
When
The time, match (W
X, W
Y)=match (W
X, W
Z)+d (W
Y, W
Z).
For example: { W among accompanying drawing Fig. 1
Dr1, W
Dvc, W
Dr2, W
D2, W
Dr3, W
Dr4, W
Dr5And W
D1, W
D2Has the semantic matches of succession relation.
Definition semantic distance function d (W
X, W
Y): two vocabulary W that expression has inheritance
X, W
YBetween the succession number of times.
(2) comprise the semantic matches relation
Definition 12 (explicit semantic subsume relations): if vocabulary W
X, W
YSatisfy following condition, use symbol W
X⊙ W
YExpression.
Condition: have concept W
Z, satisfy condition
Work as W
X⊙ W
YThe time, match (W is arranged
X, W
Y)=K
P* (d (W
X, W
Z).K
PFor comprising the matching relationship coefficient.
Definition 13 (implicit semantic subsume relation): if vocabulary W
X, W
YSatisfy following condition, use symbol W
XZero W
YExpression.
Condition: have concept W
Z, satisfy condition
Work as W
XZero W
YThe time, match (W
X, W
Y)=K
P* (d (W
Z, W
Y)).
Definition 14 (comprising the semantic matches relation): if vocabulary W
X, W
YSatisfy following condition, use symbol W
X◎ W
YExpression:
Work as W
X◎ W
YThe time, match (W
X, W
Y)=
min{match(W
X,W
Z)+match(W
Z,W
Y),match(W
X,W
Y)}
Theorem 3: when vocabulary WX, WY satisfied WX ◎ WY, WY had the semantic relation of WX.
(3b) relation of the semantic matches between noun-verb
The semantic matches relation of noun-verb can be divided into two classes:
1) SVO semantic matches relation: the receptor that holds of actor or action may be done in noun
2) lattice semantic matches relation: noun and verb have lattice semantic matches relation
Suppose that verb is V, the gerund of executing in the definition of V is S
0, being subjected to gerund is O
0Since when definition with S
0Be set to implement the top noun of V, O
0Be set to bear the top noun of this action, so only have same S
0Or O
0Having the noun S of certain relation and noun O just might execution action V, namely consists of the semantic matches of SVO.The SVO semantic matches has 6 kinds of situations, and it is worth available Value
SVOExpression, computing formula is as follows:
Value
SVO=match(S,S
0)+match(O,O
0)
Definition 15 (conventional SVO semantic matches relation): (S ∝ S satisfies condition
0) ∩ (O ∝ O
0).
Definition 16 (heavily loaded SVO semantic matches relation): satisfy condition:
For noun S and O and verb V, when not satisfying SVO when coupling in the definition of V, and the defined declaration of S, O they satisfy the SVO coupling.
Example: ring=R
D(ornaments) ∩ R
VS(wearing the people) ∩ R
VO(wearing ornaments) ∩ R
C(wearing position, hand) is owing to comprised R in " ring "
VS(wearing the people) is so { people wears, ring } consists of heavily loaded SVO semantic matches relation.
Definition 17 (comprising SVO semantic matches relation): ((S ◎ S satisfies condition
0) ∩ (O ∝ O
0)) ∪ ((S ∝ S
0) ∩ (O ◎ O
0)).
Example: class=R
D(set) ∩ R
W(set, student), because " student " can " eat " " meal ", " student " is the part of " class ", so { class eats, meal } consists of whole SVO semantic matches relation.
Definition 18 (similar SVO semantic matches relation): ((S ∽ S satisfies condition
0) ∩ (O ∝ O
0)) ∪ ((S ∝ S
0) ∩ (O ∽ O
0)).
Definition 19 (metaphor SVO semantic matches relation): under the following conditions, may have metaphor in the conjecture statement:
Condition 1: in whole statement, can satisfy front four kinds of SVO coupling without any noun.
A condition 2: have noun S or O in the statement, satisfy! (S ∝ S
0) ∩ (O ∝ O
0), conjecture is likened S to S
0
Or condition 3: have noun S or O in the statement, satisfy (S ∝ S
0) ∩! (O ∝ O
0), conjecture is likened O to O
0
For metaphor SVO semantic matches relation, Value
SVO=K
F* (match (S, W
P)+match (O, W
P))
K
FBe weights coefficient, W
PS and S
0Most recent co mmon ancestor.Because belong to conjecture character, K
FValue should be larger, to avert evil influence.
Definition 20 (lattice semantic matches relations): for noun W and verb V, satisfy
Match (W, V)=K
C* d (W, W
C), K
CBe the weights coefficient.
(3c) noun-adjectival semantic matches relation
For adjective W
VAWith noun W
N, satisfy
Match (W
VA, W
N)=K
A* d (W
N, W), K
ABe weights coefficient (general K
A=1).
(3d) semantic matching relationship arranged side by side
Semantic matching relationship arranged side by side only is used for the judgement of statement parallel construction, to determine the scope of conjunction.
Definition 21 (semantic similar): because nominal definition has adopted the method for single succession, two noun W
X, W
YAlthough in definition, do not have inheritance, at W semantically
XBut may be W
YA kind of, the concept that is equivalent in the description logic contains, and uses symbol W
X∽ W
YExpression.Can the Tableau algorithm in the description logic be improved, to judge the Concept Semantic similarity relation.
Definition 22 (noun semantic matching relationships arranged side by side): for two noun W
X, W
YAvailable match (W
X, W
Y)=K
T* (d (W
X, W
E)+d (W
Y, W
E)) calculate a numerical value, as heuristic information, W
EW
X, W
YNearest common ancestor's node.When satisfying W
X∽ W
YThe time, also may be coordination.
Definition 23 (verb semantic matching relationships arranged side by side): for two verb V
X, V
YAvailable match (W
X, W
Y)=K
T* (d (S
X0, S
Y0)+d (O
X0, O
Y0)) calculate a numerical value, as heuristic information, { S
X0, S
Y0, O
X0, O
Y0W
X, W
YActor in the definition and be subjected to the person.
(3e) relation of the semantic matches between other class vocabulary
The semantic matches of adverbial word relation: about adverbs modify adjective and adverbial word, very complicated situation is arranged also, this paper wouldn't discuss, suppose adverbial word can semantic matches in verb, adjective and adverbial word, stipulate match (W
1, W
2)=0.The semantic matches relation of measure word: lexicon should be preserved the incidence relation of measure word and noun.If measure word W can modification noun W
N, then stipulate match (W, W
N)=0; Otherwise match (W, W
N)=MAX.The semantic matches relation of pronoun: according to the relation that refers to of pronoun, pronoun is replaced to corresponding noun process, such as " I " am processed by " people ".
(3f) grammer matching relationship
Pay special attention to: the various semantic matches relations of front are inherent, and are irrelevant with concrete statement.In concrete statement, the vocabulary of possible some type is modified mutually, but do not have inherent semantic relation between the vocabulary itself, a kind of grammatical phenomenon (being phraseological modified relationship) of semantic modified relationship just may be arranged in this statement, mainly comprise following two kinds of situations:
(1) modified relationship between uncommon part of speech: between verb-verb; Between adverbial word-noun; Between adjective-verb etc.Such as " like swimming " " to be frank " etc.; These all belong to the grammer matching relationship, do not have inherent semantic matches relation between the vocabulary itself, just have phraseological modified relationship in statement.Its semantic matches is worth available match (W in the Sentence analysis process
X, W
Y)=MAX/K
GCalculate K
GType weights (K generally speaking
G=1, or satisfy K
G<1.5).
(2) the flexible conversion of parts of speech, for example adjective often can be applied flexibly and is adverbial word, and this situation this paper does not consider.
Step 4:
(4a) three levels and the syntax thereof of definition semantic structure
To carry out Sentence analysis according to the semantic model of this paper, must have the statement abstract representation method of suitable semantic model.Any statement is all formed through iteration by statement relatively simple for structure, and phrase is seen as an ingredient in the statement.In order to satisfy the semantic analysis needs of semantic model, the semantic structure of statement can be divided into three levels according to complexity and the characteristics of semantic structure: simple sentence, special simple sentence, complex sentence.
Definition 24 (simple sentences): only have a verb or adjective to make the statement C of predicate
S, available grammar G
1Come abstractdesription.
Thought design grammar G with case grammar
1, mentality of designing: suppose that V is predicate, S is the actor of V; O is the receptor that holds of V, A
BIt is preposition attribute; A
AIt is postpositive attributive; P
DBe the adverbial modifier or complement, be equivalent to one group of lattice in the case grammar; P
CIt is one lattice content; N is noun; N
PBe noun phrase.
Grammar G
1In fuzzy rules more (detailed fuzzy rules is more, slightly), mentality of designing of its crucial rule is as follows:
1) C
S→ P
DA
BSA
AP
DVP
DA
BOA
AP
D(appearance of SVO sequentially has 10 kinds, and accompanying drawing Fig. 2 is wherein a kind of)
2) S → n|SA
AA
BS (actor done in a plurality of vocabulary, such as the S among Fig. 2)
3)P
D→P
C|P
DP
C
S, O, A
B, A
A, P
CIn the service regeulations of the vocabulary such as preposition, conjunction, auxiliary word, number, measure word can write out easily.
The concrete expression mode of the G1 of the syntax is as follows:
Definition 25 (special simple sentences): have a plurality of verbs or adjective, but semantically do not comprising the statement of subordinate clause, available grammar G
2Come abstractdesription.
Grammar G
2Mentality of designing: can not produce on the basis of subordinate clause, to grammar G guaranteeing
1The few rule of middle interpolation can generating grammar G
2, mainly contain following 2 kinds of situations:
1) situation of predicate made in a plurality of verbs or adjective
2) S, O, A made in verb or adjective
B, A
A, P
CSituation
Grammar G
2Key be verb phrase V
VNoun phrase N can not directly be followed in front and back
P, namely N can not appear
P+ V
VOr V
V+ N
P
The concrete expression mode of the G2 of the syntax is as follows:
Definition 26 (complex sentences): in grammar G
2The regular N of middle interpolation
P→ C
S, form grammar G
3Because regular N
P→ C
SIllustrate that a simple sentence or special simple sentence can make in the complex sentence arbitrarily composition, realized the simple sentence recurrence, so grammar G
3Complex sentence can be described.
(4b) obtain the thinking of best grammatical analysis scheme
(1) lexical ambiguity digestion procedure
Suppose W
1W
2W
kThe lexical semantic number be respectively n
1, n
2N
k, entirely arranging for each semanteme, the result is { L
1, L
2L
M, M=n then
1*n
2* *n
k, suppose one of them
W
mN the meaning of a word, L then
iC
SA sequence of words without lexical ambiguity.Exhaustive each { L in the grammatical analysis process
1, L
2L
MAnalysis result, select best L
iJust can clear up lexical ambiguity.
(2) analyze thinking
According to axiom 1, obtain all grammatical analysis schemes, for each grammatical analysis option A
i, calculate A according to formula 1
iCorresponding semantic matches value, and select best grammatical analysis scheme.
Definition 27 (simple clause): satisfy grammar G in the statement
1Or G
2Substring be simple clause.
Suppose axiom 4 (the semantic target axiom of modifying): the semanteme modification target of supposing notional word W is W
Gi, then for the simple clause C that meets semantic logic in the statement
S, satisfy (W ∈ C
S) → (W
Gi∈ C
S).For attribute A
B(supposing next-door neighbour S) satisfies (W ∈ A
B) → (W
Gi∈ (A
B∪ S)), other situation of attribute is similar.For the adverbial modifier or complement (P
D), satisfy (W ∈ P
D) → (W
Gi∈ (P
D∪ V)).
According to semantic characteristics of modifying target, all grammatical analysis schemes can be divided into 2 layers:
1) simple clause's level grammatical analysis scheme;
2) the grammatical analysis scheme of simple clause inside.
(4c) obtain best grammatical analysis scheme
(1) can sum up simple clause's Rule of judgment
For statement C
S, carry out grammar G
1, G
2, G
3The CYK Algorithm Analysis, satisfy table 1 conditional substring s (i, j) be the simple clause that can sum up.
Table 1 can be summed up simple clause's Rule of judgment
(2) bottom-up simple clause sums up method
Available bottom-up simple clause's end method is asked for best subordinate clause level grammatical analysis scheme, sees algorithm 4:
Algorithm 1 (simple clause sums up method):
5) for statement C
S, according to the Rule of judgment of table 1, find out and to sum up the corresponding substring set of simple clause { s
1, s
2S
m;
6) for each clause s
i, calculate (algorithm 2) simple clause s
iBest semantic matches value, with s
iBe summed up as N
P, N is set
PEnd semantic;
7) make C
SEqual to sum up the result, with the simple clause s in the recursive procedure
iThe summation of best semantic matches value, carry out the recurrence of step 1-3;
8) have best full sentence semantic matches and be worth corresponding simple clause s
iScope and end sequentially be best grammatical analysis scheme.
The key that the best semantic matches value of calculating simple clause is algorithm, concrete grammar are seen next step " simple clause's best semantic matches value ".
In algorithm, when selecting, simple clause adopted the method for exhaustion, can obtain theoretic best grammatical analysis scheme.But the calculated amount of this method is larger, is difficult for realizing.But when the quantity that can return the clause less than 4 the time, also can consider.In the time can returning clause's quantity too much, (the individual semantic matches degree of k<m) preferably simple clause is carried out recursive search, to ask for suboptimum grammatical analysis scheme can only to select k.
(3) sum up semanteme
In algorithm 1, with simple clause C
SBe summed up as N
PAfter, N
PDo not have semanteme, can't carry out next step semantic matches and calculate, the mode of solution is as follows:
1) regulation is come N by end
PCan be matched with any vocabulary W, the semantic matches value is:
Match (N
P, W)=MAX/K
C(K is arranged generally
C1)
2) if N
PMake S or the O of new goal clause, then can be with N
PSemanteme be set to former C
SIn S or O.
(4d) obtain simple clause's best semantic matches value
Calculate simple clause's best semantic matches value, according to axiom 1 and axiom 4, there is multiple grammatical analysis scheme simple clause inside, must obtain all grammatical analysis schemes, for every kind of grammatical analysis scheme, the semanteme of its notional word is modified target and is determined, just can calculate semantic matches value under this grammatical analysis scheme according to formula 1, and the grammatical analysis scheme with minimum semantic matches value is exactly required analysis result.
The grammatical analysis scheme of simple clause inside can be divided into 3 layers: 1) SVO combination level; 2) A
A, P
D, A
BLevel; 3) A
A, P
D, A
BInner grammatical analysis scheme.Can select wherein best grammatical analysis scheme by algorithm 2.
Algorithm 2 (simple clause's best semantic matches value):
8) if simple clause is special simple sentence, find all methods that it is summed up as simple sentence
9) for every kind of end method, special simple sentence is summed up as simple sentence
10) for this simple sentence, find out all possible SVO combined method
11) for every kind of SVO combined method, with simple sentence C
SSegmentation if S or O are phrase, is then carried out algorithm 3; Suppose C
SBe divided into { L
1, L
2..L
n}
12) each segmentation L
iIn can comprise A at most
A, P
D, A
BThree partial contents are found out L
iIn all A
A, P
D, A
BDivision methods
13) for every kind of A
A, P
D, A
BDivision methods by the means that the syntax and semantics the matching analysis combines, determines that target modified in the semanteme of each notional word, makes A
A, P
D, A
BThe semantic matches value minimum
14) ask for the semantic matches value of full sentence, select the minimum corresponding analytic process of semantic matches value as the grammatical analysis scheme of the best
Suppose
For simple clause C
SCarry out grammar G
1The operation result of CYK algorithm, expression can generate the grammar symbol collection of substring s (i, j).
(1) the grammatical analysis scheme of SVO combination level
In simple sentence, suppose noun W
1And W
2Satisfy SVO coupling, then { W with verb V
1, V, W
2It is a SVO combination.But S (O is similar) may be a phrase, has { W in sentence
1, V, W
3And { W
2, V, W
3The SVO coupling, and V, W
3Not at W
1And W
2Centre, and W
1And W
2Middle substring s (m, n) satisfies:
The time, at W
1+ s (m, n)+W
3The phrase that forms is S, the S that in like manner can obtain more to grow or O.
Algorithm 3 (S or O segmentation):
3) obtain phrase S (or O), according to the described principle of this step, find out the institute that satisfies the SVO coupling among the S
Noun is arranged, be assumed to be { n
1, n
2..n
m}
4) according to { n
1, n
2..n
mPhrase S is divided into the m-1 section, according to regular S → n|SA
AA
BEach may not comprise A for empty segmentation to S as can be known
AA
B
(2) A
A, P
D, A
BThe division methods of level
Suppose segmentation L
iSubstring be s (m, n), then satisfy
P, q is grammatical A
A, P
D, A
BDivision methods, segmentation result is: A
A=s (m, p), P
D=s (p, q) A
B=s (q, n).
(3) A
A, P
D, A
BInner best grammatical analysis scheme
Theorem 2:A
A, P
D, A
BInner best grammatical analysis scheme is to work as A
A, P
D, A
BEach interior notional word has the best semantic corresponding grammatical analysis scheme under the target conditions of modifying.
Definition 28 (simple noun phrase): not comprising verb and adjectival noun phrase is exactly simple noun phrase.
Theorem 3: the attribute (A of simple sentence
AOr A
B) best grammatical analysis scheme be equivalent to the best grammatical analysis scheme of a simple noun phrase.
Proof: because simple sentence only comprises a verb or predicate V made in adjective, so A
A, A
BIn do not comprise verb and adjective.According to axiom 4, A
A, A
BThe grammatical analysis scheme be equivalent to simple noun phrase N
PThe grammatical analysis scheme, N
P∈ { (A
B+ S), (A
A+ S), (A
B+ O), (A
A+ O) }.
Simple noun phrase N
PDifferent grammatical analysis schemes only be subjected to the impact of { conjunction/preposition/auxiliary word/measure word }.The key of grammatical analysis is scope and their the end order of selected { conjunction/preposition/auxiliary word/measure word }.
Determining of A, scope: in { conjunction/preposition/auxiliary word/measure word }, suppose w
BFor preposition type, w
MFor preposition type, then its scope can be summed up as two kinds of forms 1) ..N
Bn..N
B1... w
M..N
A1..N
Am..; 2) W
B..N
Bn..N
B1... W
M..N
A1..N
Am...; { N wherein
B1, N
B2... N
BnThe noun of first half in the scope, { N
A1, N
A2... N
AmIt is the noun of first half in the scope.
According to the backward semantic custom of modifying of Chinese, can be at { N
A1, N
A2... N
AmIn find out grammatical and N
B1Noun N with best semantic matches value
AjAs boundary after the scope.The scope prezone of form 1 can be determined with similar method.
(2) determining of end order: { conjunction/preposition/auxiliary word/measure word } and scope thereof should be summed up by certain bar grammar rule, and the available method of exhaustion is obtained their the best end order.Generally speaking, after the simple sentence statement has carried out repeatedly segmentation, A
A, A
BIn the number n of { conjunction/preposition/auxiliary word/measure word } that comprise be generally less than 4, have the calculating feasibility.
Definition 29 (noun sequences): not having the simple noun phrase of { conjunction/preposition/auxiliary word/measure word } is noun sequence.
After { conjunction/preposition/auxiliary word/measure word } all summed up, simple noun phrase just had been known as a noun sequence, and may also there be one or two noun sequence in { conjunction/preposition/auxiliary word/measure word } scope inside in addition.Only noun affects semantic modified relationship in noun sequence, according to the backward semantic custom of modifying of Chinese, supposes that noun sequence is L
N=W
1W
2W
mThen by semantic definite L
NIn arbitrarily noun to modify target concrete grammar (approximation method) as follows:
Algorithm 4 (the best semantic target of modifying of noun sequence):
Set C is set
WBe sky, for L
NIn each noun W
iIf, match (W
i, W
m)<MAX is with W
iBe added to C
W, do following operation:
1) supposes C
WElement be W by successively order
1-W
2-...-W
n(n〉1), then be done as follows: with L
NBe divided into the n+1 section, the semanteme modification target that they are set is W
m, and each section carried out recurrence.
2) work as C
WIn when only having a noun, carry out step 3-4.
3) the forward direction modified relationship is set: for any segmentation, if there is W
xW
X+1W
Y-1W
y, satisfy condition:
1. any W
X+1W
yBetween noun and W
yAfter the semantic matches value of noun be MAX; 2. match (W
y, W
x)<MAX; W then is set
ySemanteme to modify target be W
xW then is set
X+1W
Y-1Between the modification target W of noun
y
4) if L
NIn also have noun W
yDo not modify target, it is W that its modification target then is set
Y+1
P
DAnalytical approach be similar to A
A, A
B, key is to carry out boundary line delimitation according to preposition, and the content in the preposition scope also is converted into a simple noun phrase.
(4) disposal route of special simple sentence
Sum up all non-predicate verb/adjectives, statement is converted into simple sentence, select best end scheme.Disposal route is as follows:
1) statement is carried out the CYK algorithm of grammar G 2, find verb (adjective) phrase that to do predicate, may have kinds of schemes.
2) for each scheme, sum up remaining verb (or adjective), choose the analytical plan of semantic matches value minimum.
Also need to arrange when summing up non-predicate verb (adjective) and sum up semanteme, such as " beautiful necklace ", summing up semanteme is " necklace ".
Step 5:
Can simple sentence be converted into a knowledge point according to the grammatical analysis result with best semantic matches value, each simple clause of complex sentence is converted into the knowledge point, whole complicated sentence is converted into one group of knowledge point.
Example: " moulding is that the green bronze ornaments of insect are welcome by Brazilian girl very to statement." knowledge five-tuple (seeing Table 2):
The more educated example of table 2 statement
After statement is converted into the knowledge point of depositing with the structural data form, just can carry out various Intelligent Information Processing to these knowledge datas easily.
Above-described embodiment only is for the invention example clearly is described, and is not the restriction to the invention embodiment.For those of ordinary skill in the field, can also make other changes in different forms on the basis of the above description.Here need not also can't give all embodiments exhaustive.And the apparent variation of being amplified out thus or change still are among the protection domain of the invention claim.