CN103177089A - Sentence meaning composition relationship lamination identification method based on central blocks - Google Patents

Sentence meaning composition relationship lamination identification method based on central blocks Download PDF

Info

Publication number
CN103177089A
CN103177089A CN2013100749701A CN201310074970A CN103177089A CN 103177089 A CN103177089 A CN 103177089A CN 2013100749701 A CN2013100749701 A CN 2013100749701A CN 201310074970 A CN201310074970 A CN 201310074970A CN 103177089 A CN103177089 A CN 103177089A
Authority
CN
China
Prior art keywords
predicate
node
relation
sentence
central block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013100749701A
Other languages
Chinese (zh)
Inventor
罗森林
魏超
潘丽敏
韩磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN2013100749701A priority Critical patent/CN103177089A/en
Publication of CN103177089A publication Critical patent/CN103177089A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a sentence meaning composition relation lamination identification method based on central blocks, and belongs to the technical field of computer science and Chinese information processing. The sentence meaning composition relation lamination identification method based on the central blocks is based on contemporary Chinese language semantics, and solves the problem of Chinese sentence meaning composition relation identification in a Chinese sentence meaning structural model. The sentence meaning composition relation lamination identification method based on the central blocks gives a concept of 'layer central block' firstly, and achieves effective mapping from a syntactic structure to a sentence meaning structure. A sentence meaning composition relations identification problem is divided into three layer relation identification problems, namely a relation between predicates, a relation between basic case and the predicates, and a relation between general case and sentence meaning compositions. Central block identification algorithm, basic case identification algorithm, general case identification algorithm, identification algorithm of the relation between the predicates, identification algorithm of the relation between the basic case and the predicates, identification algorithm of the relation between the general case and the sentence meaning compositions are respectively provided. A computer can analyze and acquire the sentence meaning composition relation in high accuracy rate and efficiency, and study of the Chinese sentence meaning structural model is further promoted.

Description

The adopted composition of sentence based on central block concerns layered recognition method
Technical field
The present invention relates to a kind of adopted composition of sentence based on central block and concern layered recognition method, belong to computer science and Chinese information processing technology field.
Background technology
Along with the explosive growth of information, people more and more exigence computing machine better understand the meaning that contains in natural language, and therefore, sentence justice is analyzed more and more urgent.Grasp and the analysis of distich justice especially paid attention in the language that Chinese closes as meaning.It is the meaning of parsing sentence that sentence justice is analyzed, the structure reflection sentence justice (S meaning) of different forms, the semantic relation between notional word and notional word in the parsing sentence structure is namely derived certain formalization structure (the adopted structure of sentence) that can reflect sentence justice according to the meaning of a word of each notional word in the syntactic structure of sentence and sentence.Obviously, the core of sentence justice analysis means out the formalization structure of sentence justice.The adopted structure analysis of sentence is the topmost content in whole semantic analysis research.At present, natural language processing is generally to analyze as Floor layer Technology take the sentence justice of shallow-layer, analyzes for deeper sentence justice to rarely have research.The adopted analysis of sentence of profound level need to be completed the adopted type identification of sentence, composition identification, the adopted structural model framework extraction of sentence, composition detail analysis.Wherein, the adopted structural model framework of sentence provides whole sentence adopted structure basic framework, is an of paramount importance ring in the adopted structure analysis method of sentence, be also the difficult point in the adopted structure analysis method of whole sentence, yet the correlation analysis invention is but phoenix feathers and unicorn horns.
Two key issues of main existence in the extracting method invention process of sentence justice structural model framework: (1) realizes syntactic structure information to effective utilization of the adopted structure of sentence, and (2) realize the extraction problem of sentence justice structural model framework is transformed to classification problem.Syntactic structure and the adopted structure of sentence are the forms of expression to sentence information different levels, they have certain mapping relations, finding and effectively utilize this mapping relations particularly important for the extraction of sentence justice structural model framework, is also the key of carrying out the adopted structure analysis method of sentence.
Summary of the invention
The present invention proposes a kind of extracting method for Chinese sentence justice structural model frame relations.Technical scheme of the present invention comprises following content:
Propose the concept of level central block, utilize the level central block can effectively realize the mapping relations of syntactic structure and the adopted structure of sentence.
(1) central block of every layer of phrase structure identification in syntactic structure tree, the identification of central block are namely to identify in the syntactic structure tree, each child node that consists of father node whether centered by node;
(2) semantic lattice identification is divided into the identification of predicate, fundamental mesh, general lattice;
(3) relation recognition between each adopted composition, between each adopted composition, the identification of relation is divided into the identification to three class relations: 1. relation between predicate; 2. relation between fundamental mesh and predicate; 3. relation between general lattice and each adopted composition.The identification of relation between general lattice wherein and each adopted composition, namely to the modification of each adopted composition and by the identification of modified relationship, identification by central block can obtain the modification of each phrase structure child node in syntax tree with by modified relationship, and then obtain the modification of each adopted composition with by modified relationship---general relation between lattice and each adopted composition.To the identification of three major types relation (as Fig. 1) (relation between predicate, the relation of relation between fundamental mesh and predicate, general lattice and each adopted composition), and well realized the corresponding of syntactic structure and an adopted structure.
On the whole, the present invention is divided into following 3 modules.
Step 1, central block identification
The central block definition: in the syntactic structure tree, be in by piece centered by the node of modification status in each child node of formation father node, being in the node of modifying the status is non-central; Do not have modification and adorned relation if consist of several child nodes of father's node, each child node all is designated as central block.The phrase node that comprises subject, predicate, object, predicative also is defined as central block.
If the node of syntactic structure tree is in the relation of levels in sentence justice structure, what be in the upper strata is adorned, is in the ornamental equivalent that serves as of lower floor; If be in same layer in sentence justice structure, corresponding node does not exist and modifies and adorned relation.
Central block identification adopts the C4.5 decision tree as sorting algorithm, comprises two processes (as Fig. 2):
Training process, model training is carried out in definition based on central block;
Identifying utilizes training pattern to instruct the central block identification of new sentence.
Input, the output table of central block identification are as follows.
Table 1 central block identification input/output relation table-IPO
Figure BDA00002897807900021
During the attribute of C4.5 algorithm on trade-off decision tree nodes at different levels, calculate the information gain rate of all properties, select the attribute of information gain rate maximum as the decision tree root node; Different values by the root node attribute are set up branch; Adopt the method for recurrence, the attribute that the subset of each branch is still selected information gain rate maximum is as child node, until all subsets only comprise other data of same class.
(1) information increases the fixed rate of interest
Suppose that training sample set is S, classification Ci(i=0 ..., m).Take attribute A as example, total n the property value of A (0 ..., n).Its information gain rate computing method are suc as formula shown in (1), formula (2).
GainRatio ( A ) = Gain ( A ) SplitInfo ( A ) = Info ( S ) - Info A ( S ) SplitInfo ( A ) - - - ( 1 )
Info ( S ) = - Σ i = 0 m p i log 2 ( p i )
Info A ( S ) = Σ j = 0 n | S j | | S | × Info ( S j ) - - - ( 2 )
Wherein, Gain (A) is the information gain that obtains after dividing according to attribute A, and pi is the probability that in S, arbitrary sample belongs to classification Ci, and Sj is the set that has all samples of identical verb number value j in S, | Sj| is the number of sample in Sj.
SplitInfo (A) is called division information, and computing method are suc as formula shown in (3).
SplitInfo A ( S ) = - Σ j = 0 n | S j | | S | × log 2 ( | S j | | S | ) - - - ( 3 )
(2) successive value discretize
1) seek the minimum value of this continuous type attribute, and its assignment to MIN, seek the maximal value of this continuous type attribute, and its assignment to MAX;
2) between the setting area, N in [MIN, MAX] waits breakpoint Ai, and computing method are suc as formula shown in (4);
A i = MIN + MAX - MIN N × i , I=1 wherein, 2..., N (4)
3) respectively with [MIN, Ai] and (Ai, MAX) (i=1,2 ..., N) as interval value, calculate its GainRatio value, and compare;
4) choose the maximum Ak of GainRatio value as the breakpoint of this continuous type attribute, property value is set to [MIN, Ak] and (Ak, MAX) two interval values.
Step 2, fundamental mesh and general lattice identification
In Chinese for some semantic lattice after predicate is combined, embodied predicate such or such requirement in collocation, and formed the framework of sentence justice centered by predicate, this semantic lattice are called fundamental mesh.
Corresponding with it, do not consist of the framework of sentence justice with predicate for those, and be explanation, the framework of describing sentence justice, these class semanteme lattice are called general lattice.
Step 2.1, fundamental mesh identification.
Table 2 fundamental mesh identification-IPO
Figure BDA00002897807900041
If piece centered by certain leaf node, this leaf node must be fundamental mesh and predicate.When having judged whether just can obtain this leaf node in conjunction with predicate after a leaf node belongs to fundamental mesh and predicate is fundamental mesh.Predicate is to input as known conditions.
Step 2.2, general lattice identification
The general lattice identification-IPO of table 3
According to fundamental mesh, the definition of general lattice identifies fundamental mesh and predicate in sentence as can be known, remaining part be general lattice and non-semantic lattice.
Through statistics, interjection (e), modal particle (y), conjunction (c), auxiliary word (u), the noun of locality (f), and preposition (p) is mostly non-semantic lattice.Utilize above these features to carry out the identification of non-semantic lattice, more just can identify general lattice by exclude filter.
Step 3, relation recognition between the adopted composition of sentence
On this basis, the basic framework of the adopted structural model of relation formation sentence between fundamental mesh and predicate.General lattice on sentence justice structural model basic framework basis consist of whole adopted structural model framework with the relation between each adopted composition.
Between the adopted composition of sentence, relation recognition comprises relation recognition between predicate, relation recognition between fundamental mesh and predicate, relation recognition between general lattice and each adopted composition.
Step 3.1, relation recognition between predicate
There are following five class relations between predicate:
1. predicate " A ", predicate " B " belong to same two brotghers of node stating under topic (Comment);
2. predicate " A ", predicate " B " belong to the same predicate of stating two sentences of coordination under topic (Comment);
3. the sentence at predicate " A " place serves as the fundamental mesh of predicate " B " place sentence topic (Topic);
4. the sentence at predicate " A " place serves as the fundamental mesh that topic (Comment) stated in predicate " B " place sentence;
5. the sentence at predicate " A ", predicate " B " place is respectively two subordinate sentences that consist of compound sentence.
Between predicate, the identification of relation adopts the C4.5 decision tree as sorting algorithm, comprises two processes:
Training process (as Fig. 3) carries out model training based on five class relations between predicate
Identifying (as Fig. 4) utilizes training pattern to instruct relation recognition between new sentence predicate
Relation recognition between table 4 predicate---IPO
Figure BDA00002897807900051
Topic (topic): the object that is illustrated in sentence; State topic (comment): in sentence to the declaratives of this object.
Step 3.2, relation recognition between fundamental mesh and predicate
Between fundamental mesh and predicate, relation namely: which predicate fundamental mesh belongs to, and belongs to topic or states topic.
Relation between fundamental mesh and predicate is divided into two classes:
(1) fundamental mesh belongs to the associated topic of predicate;
(2) fundamental mesh belongs to the relevant topic of stating of predicate.
According to topic with state the related definition of topic, if fundamental mesh is in predicate in sentence after, this fundamental mesh is the relevant topic of stating of predicate.If fundamental mesh is in predicate in sentence before, generally this fundamental mesh is judged to the associated topic of predicate; If but the phrase at this fundamental mesh place is the preposition phrase, and preposition be not " with ", " with ", " following ", " also ", this fundamental mesh still is judged to and states topic, because the fundamental mesh of this moment serves as words and expressions or by clause's object of preposition.
Relation recognition between table 5 fundamental mesh and predicate---IPO
Figure BDA00002897807900052
According to relation between above-mentioned rule identification fundamental mesh and predicate.
Step 3.3, relation recognition between general lattice and each adopted composition
Relation between general lattice and each adopted composition refers to namely which adopted composition general lattice modify.Central block embodies be the modification of each phrase structure child node in syntax tree with by modified relationship, the syntactic structure tree root by completing the central block mark according to corresponding mapping ruler can determine the modification of each adopted composition in the adopted structure of sentence with by modified relationship.
Mapping ruler is as follows:
In the syntactic structure tree, be in by piece centered by the node of modification status in each child node of formation father node, being in the node of modifying the status is non-central.With its father's node of central block node replacement of phrase structure, modify present father's node for originally non-central; Originally the child node of central block node becomes the brotgher of node of original non-central; And non-central non-central of being still its present brotgher of node originally, originally the central block in the child node of central block node and non-central relation are still constant.
As mentioned above, with its father's node of central block node replacement of phrase structure, after changing through too much rotating, replace until complete central blocks all in the syntactic structure tree, in so final structure tree, each lower level node is all modified its father's node.What finally obtain is the tree-like form of expression of relation between general lattice and each adopted composition.
Relation recognition-IPO between the general lattice of table 6 and each adopted composition
Figure BDA00002897807900061
Beneficial effect
Based on the adopted structural model relation recognition of sentence of central block between fundamental mesh, general lattice, fundamental mesh and predicate between relation, general lattice and each adopted composition the recognition effect of relation very large lifting is arranged.
The lifting that the present invention's explanation is extracted recognition effect for Chinese sentence justice structural model frame relations can be set about from the recognition effect of central block.
The technology that the present invention adopts has less calculating consumption, is not only applicable to desktop computer, also is applicable to the mobile computing platforms such as mobile phone, panel computer.
The present invention has significant contribution for enriching the adopted structural model of complete sentence, makes the adopted model of sentence more complete, be conducive to utilize a Yi Tezheng to carry out the more analysis of deep layer to the sentence language material, thereby the adopted model of sentence has guaranteed that Chinese information processing is had better effect.
Description of drawings
Fig. 1 is overall algorithm flow of the present invention;
Fig. 2 is central block recognizer schematic diagram of the present invention;
Fig. 3 is the training system schematic diagram of embodiment;
Fig. 4 is the recognition system schematic diagram of embodiment;
Fig. 5 is that embodiment is directly carried out the identification of fundamental mesh with the central block of mark;
To be embodiment carry out the identification of fundamental mesh to Fig. 6 on the central block basis of identifying;
Fig. 7 is that embodiment is directly carried out the identification of general lattice with the central block of mark;
To be embodiment carry out the identification of general lattice to Fig. 8 on the central block basis of identifying;
Fig. 9 is that embodiment is directly carried out the identification of relation between fundamental mesh and predicate with the central block of mark;
To be embodiment carry out the identification of relation between fundamental mesh and predicate to Figure 10 on the central block basis of identifying;
Figure 11 is that embodiment is directly carried out the identification of relation between general lattice and each adopted composition with the central block of mark;
To be embodiment carry out the identification of relation between general lattice and each adopted composition to Figure 12 on the central block basis of identifying;
Figure 13 is that the direct central block with mark of embodiment carries out the extraction of Chinese sentence justice structural framing;
Figure 14 is the extraction that the central block that identifies of embodiment carries out Chinese sentence justice structural framing;
Embodiment
Be described in further details objects and advantages of the present invention below in conjunction with the embodiment of drawings and Examples to the inventive method in order better to illustrate.
This experiment of data source of adopting in experiment uses the BIT-BFS tagged corpus as training and the test data of experiment, totally 6347.Concrete formation is as shown in table 7.
Table 7 experiment data source information used
Figure BDA00002897807900071
The below will describe one by one to above-mentioned 2 testing processs, and all tests are all completed on same computer, and concrete configuration is: Intel double-core CPU(dominant frequency 3.0G), and 2.00G internal memory, WindowsXP SP3 operating system.
All experimental results of this paper all adopt PARSEVAL assay system to evaluate and test.Mainly formed by accuracy rate (Precision embodies accuracy), recall rate (Recall embodies comprehensive) two parts.F1 value (harmomic mean of Precision and Recall) is comprehensive evaluation accuracy and comprehensive index.
Experiment with accuracy rate, recall rate, F value and the whole accuracy rate of single classification as evaluation index.Suppose classification A, its accuracy rate, recall rate, F value calculating method are suc as formula shown in (5), (6), (7).
Figure BDA00002897807900082
Figure BDA00002897807900083
The classification results of last comprehensive all categories draws the whole accuracy rate of algorithm, shown in (8).
Figure BDA00002897807900084
In experiment, participle adopts the ICTCLAS(Institute of Computing Technology that the Computer Department of the Chinese Academy of Science provides, Chinese Lexical Analysis System) as the instrument of lexical analysis.The name recognition accuracy of ICTCLAS reaches (973 evaluation and test) more than 98%, directly utilizes this identification of function who object.
1. central block identification experiment
Step 1.1, the sentence of central block mark has been completed in input.Then to completing the sentence of central block mark, carry out in the syntactic structure tree each node whether centered by the extraction of this information of piece.
Step 1.1.1, this paper have chosen altogether 20 features (table 8) and have carried out the identification of central block, and wherein validity feature is 16, and invalid feature 4 (sequence number is 10,11,12,17) marks with boldface type.Test the parameter of carrying out the C4.5 algorithm by gridding method and choose, when the minimum instance number of MinObjS(branch)=2, ConfidenceFactor(beta pruning degree of confidence)=0.5 o'clock, the identification of carrying out central block with the C4.5 algorithm has the highest recognition accuracy.
Table 8 central block identification feature used
Figure BDA00002897807900085
Figure BDA00002897807900091
Step 1.1.2 adopts the algorithm based on classification, carries out the identification of central block with 20 features respectively, and ten folding cross validation modes are adopted in identification, and the recognition result of central block is sorted.
Step 1.1.3 on the basis of 20 features, from the poorest feature of recognition effect, deducts each feature, until deduct all 20 features successively.If after deducting certain feature, the recognition effect of central block increase on the contrary or effect constant this is characterized as negative characteristics or invalid feature.
Step 1.1.4 deducts those negative characteristics listed above or invalid feature successively on the basis of all 20 features, if the recognition effect of finding to deduct after certain feature can descend, skip this feature, until deduct last negative or invalid feature.Those remaining features are best combinations of attributes.
Step 1.2, the central block feature of 16 optimums is chosen in the central block feature extraction
Step 1.3 is carried out the training of model according to the feature of extracting and the example of central block information, obtains decision model.According to the decision model that obtains, instruct the central block identification of new sentence.
More than to adopt the C4.5 decision tree to carry out the step of model training.
Step 1.4, identifying input utilize the central block of training to determine whether central block through the sentence of syntactic analysis.
2. fundamental mesh, general lattice identification experiment
Step 2.2.1 identifies general lattice and non-semantic lattice according to the recognition rule of general lattice and non-semantic lattice;
Step 2.2.2 identifies non-semantic lattice according to the recognition rule of non-semantic lattice;
Step 2.2.3, the non-semantic lattice of getting rid of in general lattice and non-semantic lattice just obtain general lattice;
Step 2.2.4, good central block carries out the identification of general lattice directly to take mark, and the recognition result of completing is added up;
Step 2.2.5, the mode that ten foldings intersect is completed the identification of central block, carries out on this basis the identification of general lattice.The recognition result of completing is added up.
3. relation recognition experiment between predicate
Step 3.1, input is used for the decision model training through the sentence of predicate relation in syntactic analysis and sentence;
Relation recognition feature used between table 9 predicate
Figure BDA00002897807900101
Figure BDA00002897807900111
Step 3.1.1 first carries out the identification of central block with all 25 features, then deducts respectively each feature, deducts more respectively the recognition result of corresponding central block after each feature.Ten folding cross validation modes are adopted in identification.
Step 3.1.2 sorts to the central block recognition effect that deducts respectively after each feature, and listing the recognition effect that deducts after certain feature central block has and promote or do not have influential those features.Deduct successively those features listed above on the basis of all 25 features, if the recognition effect of finding to deduct after certain feature can descend, skip this feature, till the feature that to the last is listed.
Step 3.1.3 gets on the basis of all 25 features except above those features that deduct successively, and those remaining features are best combinations of attributes.
Step 3.1.4, according to the above best feature combination that obtains, carry out the parameter of C4.5 sorting algorithm chooses by gridding method: when the minimum instance number of MinObjS(branch)=2, ConfidenceFactor(beta pruning degree of confidence)=0.3 o'clock, the identification of carrying out relation between predicate with the C4.5 algorithm has the highest recognition accuracy.
Step 3.2, between predicate, relationship characteristic extracts, and chooses 25 features;
Step 3.3 is carried out the training of model according to the example of relation between the feature of extracting and predicate, obtains decision model.
Be more than the training process (as Fig. 3) of relation recognition between predicate, the below is the identifying (as Fig. 4) for the treatment of parsing sentence.
Step 3.4, according to the decision model that obtains, relation recognition between the predicate of the new sentence of guidance.
4. relation recognition experiment between fundamental mesh and predicate
Step 4.1 is carried out the research of relation recognition rule between fundamental mesh and predicate according to the sentence of completing syntactic analysis and the analysis of sentence justice;
Step 4.2 according to the recognition rule that obtains, is carried out the identification of relation between fundamental mesh and predicate on the predicate basis in fundamental mesh, sentence in the known sentence of completing syntactic analysis, sentence;
Step 4.3 obtains in sentence relation between fundamental mesh and predicate.
Step 4.4, good central block carries out the identification of relation between fundamental mesh and predicate directly to take mark, and the recognition result of completing is added up.
Step 4.5 is completed the identification of central block with the mode of ten folding intersections, carry out on this basis the identification of relation between fundamental mesh and predicate.The recognition result of completing is added up.
5. relation experiment between general lattice and the adopted composition of sentence
Step 5.1 is carried out the research of relation recognition rule between general lattice and each adopted composition according to the sentence of completing central block mark and the analysis of sentence justice;
Step 5.2 according to the recognition rule that obtains, is carried out the identification of relation between general lattice and each adopted composition on general lattice basis in the known sentence of completing central block identification and sentence;
Step 5.3 obtains relation between general lattice and each adopted composition.
Step 5.4, good central block carries out the identification of relation between general lattice and each adopted composition directly to take mark, and the recognition result of completing is added up.
Step 5.5 is completed the identification of central block with the mode of ten folding intersections, carry out on this basis the identification of relation between general lattice and each adopted composition.The recognition result of completing is added up.
Under BIT-BFS tagged corpus sentence data source, each algorithm has all been carried out the right-angled intersection checking.Wherein, 5488 data sources have been adopted in central block identification, and other 5 algorithms all adopt 6347 data sources.Carry out identifying based on the central block of C4.5 decision tree with 20 features, obtain 16 of validity features, final recognition accuracy is 96.42%; Rule-based fundamental mesh identification, recognition accuracy is 90.00%, the identification recall rate is 97.15%; Rule-based general lattice identification, recognition accuracy is 93.46%, the identification recall rate is 94.04%; Carry out obtaining 20 of validity features based on relation recognition between the predicate of C4.5 decision tree with 24 features, final recognition accuracy is 91.12%; Relation recognition between rule-based fundamental mesh and predicate, recognition accuracy are 87.49%, and the identification recall rate is 95.14%; Relation recognition between rule-based general lattice and each adopted composition, recognition accuracy is 88.52%, the identification recall rate is 86.60%.
Experimental result Fig. 5~Figure 14 shows: the recognition effect of central block has very large impact to the recognition effect of relation between relation between fundamental mesh, general lattice, fundamental mesh and predicate, general lattice and each adopted composition, can set about from the recognition effect of central block for the lifting of Chinese sentence justice structural model frame relations extraction recognition effect.During between relation, general lattice and each adopted composition, relation recognition is tested between fundamental mesh, general lattice, fundamental mesh and predicate, by the recognition effect of words and expressions (BEI) will differ from than other formula, so can consider to improve thus by the recognition effect of words and expressions (BEI) and even all formulas from being set about carrying out error analysis by the special sentence formula of words and expressions (BEI).

Claims (9)

1. Chinese sentence justice structural model concerns extracting method, first successively extract the middle-level central block of syntactic structure tree, and then obtain the modified relationship of phrase interblock and extract sentence trunk, it is characterized in that: can complete respectively the central block identification of every layer of phrase structure in the syntactic structure tree; Semantic lattice identification; Relation recognition between each adopted composition.
2. Chinese sentence justice structural model according to claim 1 concerns extracting method, it is characterized in that the central block definition.The contents are as follows:
The central block definition: in the syntactic structure tree, be in by piece centered by the node of modification status in each child node of formation father node, being in the node of modifying the status is non-central; Do not have modification and adorned relation if consist of several child nodes of father's node, each child node all is designated as central block.The phrase node that comprises subject, predicate, object, predicative also is defined as central block.
3. Chinese sentence justice structural model according to claim 1 concerns extracting method, it is characterized in that phrase interblock modified relationship recognition rule.The contents are as follows:
If the node of syntactic structure tree is in the relation of levels in sentence justice structure, what be in the upper strata is adorned, is in the ornamental equivalent that serves as of lower floor; If be in same layer in sentence justice structure, corresponding node does not exist and modifies and adorned relation.
4. Chinese sentence justice structural model according to claim 1 concerns extracting method, it is characterized in that identifying the central block module.The contents are as follows:
Central block identification adopts the C4.5 decision tree as sorting algorithm, comprises two processes:
Training process, model training is carried out in definition based on central block;
Identifying utilizes training pattern to instruct the central block identification of new sentence.
5. Chinese sentence justice structural model according to claim 1 concerns extracting method, it is characterized in that the fundamental mesh recognition rule.The contents are as follows:
Fundamental mesh type rule: if piece centered by certain leaf node, this leaf node must be fundamental mesh and predicate.When having judged whether just can obtain this leaf node in conjunction with predicate after a leaf node belongs to fundamental mesh and predicate is fundamental mesh.Predicate is to input as known conditions.
6. Chinese sentence justice structural model according to claim 1 concerns extracting method, it is characterized in that general lattice recognition rule.The contents are as follows:
General format is rule: through statistics, interjection (e), modal particle (y), conjunction (c), auxiliary word (u), the noun of locality (f), and preposition (p) is mostly non-semantic lattice.Utilize above these features to carry out the identification of non-semantic lattice, more just can identify general lattice by exclude filter.
7. Chinese sentence justice structural model according to claim 1 concerns extracting method, it is characterized in that relation recognition rule between predicate.The contents are as follows:
Relation recognition rule between predicate:
1. predicate " A ", predicate " B " belong to same two brotghers of node stating under topic (Comment);
2. predicate " A ", predicate " B " belong to the same predicate of stating two sentences of coordination under topic (Comment);
3. the sentence at predicate " A " place serves as the fundamental mesh of predicate " B " place sentence topic (Topic);
4. the sentence at predicate " A " place serves as the fundamental mesh that topic (Comment) stated in predicate " B " place sentence;
5. the sentence at predicate " A ", predicate " B " place is respectively two subordinate sentences that consist of compound sentence.
Between predicate, the identification of relation adopts the C4.5 decision tree as sorting algorithm, comprises two processes:
Training process carries out model training based on five class relations between predicate
Identifying utilizes training pattern to instruct relation recognition between new sentence predicate.
8. Chinese sentence justice structural model according to claim 1 concerns extracting method, it is characterized in that relation recognition rule between fundamental mesh and predicate.The contents are as follows:
Relation recognition rule between fundamental mesh and predicate:
According to topic with state the related definition of topic, if fundamental mesh is in predicate in sentence after, this fundamental mesh is the relevant topic of stating of predicate.If fundamental mesh is in predicate in sentence before, generally this fundamental mesh is judged to the associated topic of predicate; If but the phrase at this fundamental mesh place is the preposition phrase, and preposition be not " with ", " with ", " following ", " also ", this fundamental mesh still is judged to and states topic, because the fundamental mesh of this moment serves as words and expressions or by clause's object of preposition.
9. Chinese sentence justice structural model according to claim 1 concerns extracting method, it is characterized in that relation recognition rule between general lattice and each adopted composition.The contents are as follows:
Relation recognition rule between general lattice and each adopted composition:
In the syntactic structure tree, be in by piece centered by the node of modification status in each child node of formation father node, being in the node of modifying the status is non-central.With its father's node of central block node replacement of phrase structure, modify present father's node for originally non-central; Originally the child node of central block node becomes the brotgher of node of original non-central; And non-central non-central of being still its present brotgher of node originally, originally the central block in the child node of central block node and non-central relation are still constant.
As mentioned above, with its father's node of central block node replacement of phrase structure, after changing through too much rotating, replace until complete central blocks all in the syntactic structure tree, in so final structure tree, each lower level node is all modified its father's node.What finally obtain is the tree-like form of expression of relation between general lattice and each adopted composition.
CN2013100749701A 2013-03-08 2013-03-08 Sentence meaning composition relationship lamination identification method based on central blocks Pending CN103177089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013100749701A CN103177089A (en) 2013-03-08 2013-03-08 Sentence meaning composition relationship lamination identification method based on central blocks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013100749701A CN103177089A (en) 2013-03-08 2013-03-08 Sentence meaning composition relationship lamination identification method based on central blocks

Publications (1)

Publication Number Publication Date
CN103177089A true CN103177089A (en) 2013-06-26

Family

ID=48636950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013100749701A Pending CN103177089A (en) 2013-03-08 2013-03-08 Sentence meaning composition relationship lamination identification method based on central blocks

Country Status (1)

Country Link
CN (1) CN103177089A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740238A (en) * 2016-03-04 2016-07-06 北京理工大学 Method for constructing event relationship strength graph fusing sentence meaning information
CN108984666A (en) * 2018-06-29 2018-12-11 阿里巴巴集团控股有限公司 Data processing method, data processing equipment and server
CN111984778A (en) * 2020-09-08 2020-11-24 四川长虹电器股份有限公司 Dependency syntax analysis and Chinese grammar-based multi-round semantic analysis method
WO2023236240A1 (en) * 2022-06-09 2023-12-14 深圳计算科学研究院 Data screening method and apparatus based on reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1290371A (en) * 1998-02-13 2001-04-04 微软公司 Segmentation of Chinese text into words
WO2002001404A2 (en) * 2000-06-27 2002-01-03 Text Analysis International, Inc. Automated generation of text analysis systems
CN1991819A (en) * 2005-12-30 2007-07-04 北京法国电信研发中心有限公司 Language morphological analyzer
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1290371A (en) * 1998-02-13 2001-04-04 微软公司 Segmentation of Chinese text into words
WO2002001404A2 (en) * 2000-06-27 2002-01-03 Text Analysis International, Inc. Automated generation of text analysis systems
CN1991819A (en) * 2005-12-30 2007-07-04 北京法国电信研发中心有限公司 Language morphological analyzer
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈功: "基于中心块的汉语句义结构模型框架提取方法", 《北京理工大学硕士论文》, 31 December 2011 (2011-12-31) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740238A (en) * 2016-03-04 2016-07-06 北京理工大学 Method for constructing event relationship strength graph fusing sentence meaning information
CN105740238B (en) * 2016-03-04 2019-02-01 北京理工大学 A kind of event relation intensity map construction method merging sentence justice information
CN108984666A (en) * 2018-06-29 2018-12-11 阿里巴巴集团控股有限公司 Data processing method, data processing equipment and server
CN111984778A (en) * 2020-09-08 2020-11-24 四川长虹电器股份有限公司 Dependency syntax analysis and Chinese grammar-based multi-round semantic analysis method
CN111984778B (en) * 2020-09-08 2022-06-03 四川长虹电器股份有限公司 Dependency syntax analysis and Chinese grammar-based multi-round semantic analysis method
WO2023236240A1 (en) * 2022-06-09 2023-12-14 深圳计算科学研究院 Data screening method and apparatus based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN104933027B (en) A kind of open Chinese entity relation extraction method of utilization dependency analysis
Pasupat et al. Compositional semantic parsing on semi-structured tables
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN103235772B (en) A kind of text set character relation extraction method
TWI662425B (en) A method of automatically generating semantic similar sentence samples
CN106919689A (en) Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge
CN104484374B (en) A kind of method and device creating network encyclopaedia entry
CN105701253A (en) Chinese natural language interrogative sentence semantization knowledge base automatic question-answering method
CN105843897A (en) Vertical domain-oriented intelligent question and answer system
CN103150303B (en) Chinese semantic meaning lattice layered recognition method
CN111209412A (en) Method for building knowledge graph of periodical literature by cyclic updating iteration
CN105138864B (en) Protein interactive relation data base construction method based on Biomedical literature
CN103678278A (en) Chinese text emotion recognition method
WO2020010834A1 (en) Faq question and answer library generalization method, apparatus, and device
CN106202543A (en) Ontology Matching method and system based on machine learning
CN110472203B (en) Article duplicate checking and detecting method, device, equipment and storage medium
CN107122349A (en) A kind of feature word of text extracting method based on word2vec LDA models
CN111931506A (en) Entity relationship extraction method based on graph information enhancement
CN103176963A (en) Chinese sentence meaning structure model automatic labeling method based on CRF ++
CN111597356B (en) Intelligent education knowledge map construction system and method
CN105760462B (en) Man-machine interaction method and device based on associated data inquiry
CN113806563A (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN115576984A (en) Method for generating SQL (structured query language) statement and cross-database query by Chinese natural language

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130626